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(54) Title: DNA MOLECULES AND PROTEIN DISPLAYING IMPROVED TRIAZINE COMPOUND DEGRADING ABILITY 
(57) Abstract 

This invention relates to the identification of homologs of atrazine chlorohydrolase and the use of these homologs to degrsic 
j-triazine-containing compounds. In particular, this invention includes the identification of homologs of atrazine chlorohydrolase encoded 
by a DNA fragment having at least 95 % homology to the sequence from the nucleic acid sequence beginning at position 236.and cndir ? 
at position 1655 of SEQ ID NO:l. where the DNA fragment is capable of hybridizing under sU-ingent conditions to SEQ ID NO: 1 and hi 
altered catalytic acitivity as compared with wild-type atrazine chlorohydrolase. 
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DNA MOLECULES AND PROTEIN DISPLAYING IMPROVED 
TRIAZINE COMPOUND DEGRADING ABILITY 

Background of the Invention 

5 More than 8 million organic compounds are known and many are 

thought to be biodegradable by microorganisms, the principle agents for 
recycling organic matter on Earth. In this context, microbial en2ymes represent 
the greatest diversity of novel catalysts. This is why microbial enzymes are 
predominant in industrial enzyme technology and in bioremediation, whether 

10 used as purified enzymes or in whole cell systems. 

There is increased interest in engineering bacterial enzymes for 
improved industrial performance. For example, site directed mutagenesis of 
subtilisin has resulted in the development of enzyme variants with improved 
properties for use in detergents. Most applied enzymes, particularly those used 

15 in biodegrading pollutants, however, are naturally evolved. That is, they are 
unmodified from the form in which they were originally present in a soil 
bacterixmi. 

For example, most bioremediation is directed against petroleum 
hydrocarbons, pollutants that are natural products and thus have provided 

20 selective pressure for bacterial enzyme evolution over millions of years. 

Synthetic compounds not resembling natural products are more likely to resist 
biodegradation and hence accumulate in the environment. This changes over a 
bacterial evolutionary time scale; compounds considered to be 
non-biodegradable several decades ago, for example PCBs and 

25 tetrachloroethylene, are now known to biodegrade. This is attributed to recent 
evolution and dispersal of the newly evolved gene(s) throughout microbial 
populations by mechanisms such as conjugative plasmids and transposable DNA 
elements. 

A better understanding of the evolution of new biodegradative 
30 enzymes will reveal how nature cleanses the biosphere. Furthermore, the ability 
to emulate the process in the laboratory may shave years off the lag period • i 
between the introduction of a new molecular compound into the environment 
' ' ' • ' ^ * * and the development of a dispersed microbial antidote tliat will remove it ' 7- 
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Atrazine [2-chloro-4-(ethylamino)-6-(isopropyIamino)-l,3,5- 
triazine)] is a widely used j-triazine (i.e., symmetric triazine) herbicide for the 
control of broad-leaf weeds. Approximately 800 million pounds were used in 
the United States between 1980 and 1990. As a result of this widespread use. for 
both selective and nonselective weed control, atrazine and other j-triazine- 
containing compounds have been detected in ground and surface water in several 
countries. 

Numerous studies on the environmental fate of atrazine have 
shown that atrazine is a recalcitrant compound that is transformed to CO, very 
slowly, if at all, under aerobic or anaerobic conditions. It has a water solubility 
of 33 mg/I at 27°C. Its half-life (i.e., time required for half of the original 
concentration to dissipate) can vary from about 4 weeks to about 57 weeks when 
present at a low concentration (i.e., less than about 2 parts per million (ppm)) in 
soil. High concentrations of atrazine, such as those occurring in spill sites have 
1 5 been reported to dissipate even more slowly. 

As a result of its widespread use, atrazine is often detected in 
ground water and soils in concentrations exceeding the maximum contaminant 
level (MCL) of 3 ng/1 (i.e., 3 parts per billion (ppb)), a regulatory level that took 
effect in 1992. Point source spills of atrazine have resulted in levels as high as 
20 25 ppb in some wells. Levels of up to 40,000 mg/1 (i.e., 40.000 parts per million 
(ppm)) atrazine have been found in the soil at spill sites more than ten years after 
the spill incident. Such point source spills and subsequent runoff can cause crop 
damage and ground water contamination. 

There have been numerous reports on the isolation of 5-triazine- 
25 degrading microorganisms (see, e.g., Behki et al., J. Aerie. Fond Chem 34, 746- 
749 (1986); Behki et al.. Appl. F.nvirnn Microhini , 50, 1955-1959(1993); 
Cook, FEMS Microbiol Fev , M, 93-1 16 (1987); Cook et al., J. Agric. Food 
CheOL, 22, 1 135-1 143 (1981); Erickson et al.. Critical Rev. Rnvirnn Cnnt , 12, ,; 
1-13 (1989); Giardina et al, Agric. Biol.Chem 44 2067-2072 (1980); Jessee et :; 
, Appl, Environ, MioroMn], il, 97-102 (1983); Mandelbaum et al., AppL SSSi 
Environ. Microbiol „ 61. 1451-1457 (1995); Mandelbaum et al.. ApdI. Enviroivj & 
MicmMflL, ^. 1695-1701 (1993); Mandelbaum et al.. Environ. Sci Tprhnnl .^S 
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22, 1943-1946 (1993); Radosevich et al., Appl. Environ. Microbiol.. 297-302 
(1995); and Yanze-Kontchou et al., A ppl, Environ . Microbiol.. 6Q, 4297-4302 

(1994) ). Many of the organisms described, however, failed to mineralize 
atrazine (see, e.g., Cook, FF.M<; Microbiol. Rev. . 46, 93-1 16 (1987); and Cook et 

5 al., J. Apric. Food Chem. . 29. 1 135-1 143 (1981)). While earlier studies have 
reported atrazine degradation only by mixed microbial consortia, more recent 
reports have indicated that several isolated bacterial strains can degrade atrazine. 
In fact, research groups have identified atrazine-degrading bacteria classified in 
different genera from several different locations in the U.S. (e.g., Minnesota, 

1 0 Iowa, Louisiana, and Ohio) and Switzerland (Basel). 

An atrazine-degrading bacterial culture, identified as Pseudomonas 
sp. strain ADP (Mandelbaum et al., App' F.nviron. Microbiol.. M, 1451-1457 

(1995) ; Mandelbaum et al., Appl Environ . Microbiol.. 52, 1695-1701 (1993); de 
Souza et al., J. Bact . 178 . 4894-4900 (1996); and Mandelbaum et al., Environ. 

15 Sci. Technol.. 22, 1943-1946 (1993)), was isolated and was found to degrade 

atrazine at concentrations greater than about 1,000 ng/ml under growth and non- 
growth conditions. See also, Radosevich et al., Appi- Environ. Microbiol, 61, 
297-302 (1995) and Yanze-Kontchou et al., Appl. Environ. Microbiol.. ^, 4297- 
4302 (1994). Pseudomonas sp. strain ADP (Atrazine degrading ^ewi/o/nonos) 
20 uses atrazine as a sole source of nitrogen for growth. The organism completely 
mineralizes the j-triazine ring of atrazine under aerobic growth conditions. That 
is, this bacteria is capable of degrading the j-triazine ring and mineralizing 
organic intermediates to inorganic compounds and ions (e.g., CO2). 

The genes that encode the enzymes for MELAMINE (2,4,6- 
25 triamino-5-tria2dne) metabolism have been isolated from a Pseudomonas sp. 

stram. The genes that encode atrazine degradation activity have been isolated 
from Rhodococcus sp. strains; however, the reaction results in the dealkylation of 
atrazine. In addition, the gene that encodes' atraane dechlorination has been 
isolated from a Pseudomonas sp. strain. See, for example, de Souza et al., AppL 
30 Environ. Microbiol. . ^, 3373 (1995). The protein expressed by the gene ' 
disclosed by de Souza et al., degrades atraane, for example,- at a V„„ of about , 
2.6 umpl of hydroxy atrazme per min per mg protein. Although this is . . 
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s.gnifica„,, i, is de^table .0 cb«i„ g«„ and Ac proteins ,hey «<p^s « 
able ,0 dechlorinate .riazi„«omai„i„g co,„p„„„ds with chlorine moieties a. an ' 
even h,gher rate and/or ™der a variety of conditions, such as, b„, no. limited ,0 
condmons of high temperature (e.g., at leas, about 45 »C and preferably at leas. ' 
about 65»C), various pH conditions, and/or under conditions of high .ah content 
(e.g.. about 20-30 g/L), or under other conditions in which the wild type e„z>™e 
.s not stable, efficient, or active. Similarly, it is desirable .0 obWn genes and 
prcerns encoded by these genes that degrade triazine-comaining compounds 
such as those Wazine containing compounds available under the trade names- 

' "AMETRYN","PR0MEraYN","CyANA2INE".-MELAMlNE". 
"SIMAZINE-, as well as TERBUTHYLAZINE and 

desethyldesisopiopylatriazine. It is also desirable to idemily pro.eins expressed 
m organisms tha. degrade triazine-conWning compounds in flte presence of oter 
nitrogen sources such as ammonia and nittate. 

Summary ffff Hmrntinn 
The presen. invention provides isolated and purified DNA 
molecules Uia. encode attazine degrading enzymes similar .0, bu. having 
different catalytic activities from a wild type (i.e., from an isolated but namrally 
occumngattazinechiorohydrolase). The .erm -altered enzymatic acUvities" is 
used ,0 refer ,0 homologs of attazine chlorohydrolase having altered catalytic 
rates as quamitated by k„ and K., improved ability .0 degrade atiazme, al.ered 
substrare ranges. aUered acivities as compared to the native sequence in aqueous 
solutions, altered stability in solvents, altered active .emperanire ranges or 
ahered reaction conditions such as sal. concentration, pH, improved activity in a 
soil environment and ti,e like, as compared wifl, ttie wild..ype atrazine 
chlorohydrolase (AtA) protem. 

In one preferred embodiment, die presem invention provides DNA 
fragments encoding a homolog of atrazine chlorhydrolase and cbmjHsing tiie ^ - ■ " 
sequence selected from tiie group consisting of SEQ ID NOrSi^EQ ID >to-4 ^« 3 

. SEQIDN0S:7-I, andSEQIDN0S^17^1. Inineembodimyittelh^^^^^^^ 
relates .oUiese DNA fagmen.s in. vector;p,eferably«exi,&!^ ,,3; 
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Further, the invention relates to the DNA fragment in a cell. In one embodiment 
the cell is a bacterium and in a preferred embodiment, the bacterium is £. coli. 

The invention also relates to j-triazine-degrading proteins having at 
least one amino acid different from the protein of SEQ ID N0:2, wherein the 
5 coding region of the nucleic acid encoding the ^-triazine degrading protein has at 
least 95% homology to SEQ ID N0:1 and wherein the ^-triazine-degrading 
protein has an altered catalytic activity as compared with the protein having the 
sequence of SEQ ID N0;2. In one embodim it, the protein is selected from the 
group consisting of SEQ ID NOS: 5, 6 and 22-26. In one embodiment the 

1 0 substrate for the 5-triazine degrading protein is ATRAZINE. In another 
embodiment the substrate for the 5-triazine degrading protein is 
TERBUTHYLAZINE and in yet another embodiment the substrate for the 
triazine degrading protein is MELAMINE. In another embodiment this 
invention relates to a remediation composition comprising a cell producing at 

1 5 least one j-triazine-degrading protein having at least one amino acid different 

from the protein of SEQ ID N0:2, wherein the coding region of the nucleic acid 
encoding the 5-triazine degrading protein has at least 95% homology to SEQ ID 
N0:1 and wherein the ^-triazine-degrading protein has an altered catalytic 
activity as compared with the protein having the sequence of SEQ ID N0:2. In a 

20 preferred embodiment the composition is suitable for treating soil or water. In 
another embodiment the remediation composition comprises at least one s- 
triazine-degrading protein having at least one amino acid different from the 
protein of SEQ ID N0:2, wherein the coding region of the nucleic acid encoding 
the 5-triazine degrading protein has at least 95% homology to SEQ ID N0:1 and 

25 wherein the ^-triazine-degrading protein has an altered catalytic activity as 

compared with the protein having the sequence of SEQ ID N0:2. In a preferred 
embodiment this composition is also suitable for treating soil or water. In one 
embodiment the remediation composition comprises the protein bound to an 
immobilization support. In yet another embodiment, these proteins are ; 

' 30 homotetramers, such as the homotetramers formed by AtzA. . . 
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In another embodiment the invention relates to a protein selected 
from the group consisting of proteins comprising the amino acid sequences of 
SEQIDNOS:5,6and22.26. 

In another aspect of this invention, the invention relates to a DNA 
fragment having a portion of its nucleic acid sequence having at least 95% 
homology to a nucleic acid sequence consisting of position 236 and ending at 
position 1655 of SEQ ID NO:I, wherein the DNA fragment is capable of 
hybridizing under stringent conditions to SEQ ID NO: 1 and wherein there is at 
least one amino acid change in the protein encoded by the DNA fragment as 
compared with SEQ ID N0:2 and wherein the protein encoded by the DNA 
fragment is capable of dechlorinating at least one s-triazine-containing 
compound and has a catalytic activity different from the enzymatic activity of 
the protein of SEQ ID N0:2. In one embodiment the .-triazine-containing 
compound is ATRAZINE. TERBUTHYLAZINE. or MELAMINE. In one 
15 embodiment. 

The invention also relates to a method for treating a sample 
comprising an .-triazine containing compound comprising the step of adding a 
adding a protein to a sample comprising an .-triazine-containing compound 
wherein the protein is encoded by gene having at least a portion of the nucleic 
acid sequence of the gene having at least 95% homology to the sequence 
beginning at position 236 and ending at position 1655 of SEQ ID N0:1. wherein 
the gene is capable of hybridizing under stringent conditions to SEQ ID NO: 1. 
wherein there is at least amino acid change in tlie protein encoded by the ' 
DNA fragment as compared with SEQ ID N0:2 and wherein the protein has an 
altered catalytic activity as compared to the protein having the amino acid 
sequence of SEQ ID N0:2. In one embodiment, the composition comprises 
bacteria expressing the protein. In one embodiment the ^-triazine -containing 
compound is atrazine. in another the ^-triazine-containing compound is 

TERBUTOYLAZINE and in another the 5-triazine containing compound is 
(2,4.6-triamino-.-tria2ine). In one embodiment, the protein encoded by|e getir. 
is selected from the group consisting of SEQ ID NOS: 5, 6 and 22-26. ' W-'^ ' 



20 



25 
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In another aspect, this invention relates to a method for obtaining 
homologs of an atrazine chlorohydrolase comprising the steps of obtaining a 
nucleic acid sequence encoding atrazine chlorohydrolase, mutagenizing the 
nucleic acid to obtain a modified nucleic acid sequence that encodes for a protein 

5 having an amino acid sequence with at least one amino acid change relative to 
the amino acid sequence of the atrazine chlorohydrolase, screening the proteins 
encoded by the modified nucleic acid sequence; and selecting proteins with 
altered catalytic activity as compared to the catalytic activity of the atrazine 
chlorohydrolase. Preferably, the atrazine chlorohydrolase nucleic acid sequence 

1 0 is SEQ ID NO: 1 . In one embodiment the altered catalytic activity is an 

improved ability to degrade ATRAZINE. In another embodiment, the altered 
catalytic activity is an altered substrate activity. 

Other homologs with an improved rate of catalytic activity for 
atrazine include clones A40, A42, A44, A46 and A60 having nucleic acid 

1 5 sequences (SEQ ID NOS :17-21 , respectively). Other homologs capable of better 
degrading TERBUTHYLAZINE include A42, A44, A46 and A60 as well as Al 1 
andA13. 



20 Brief Description of the Drawings 

Fig. 1. Nucleotide sequence alignment of wild type atzA (bottom 
sequence) from Pseudomonas sp. strain ADP and clone (A7) (SEQ ID N0:1 and 
SEQIDN0:3). 

Fig. 2. Nucleotide sequence alignment of wild type atzA (bottom 
25 sequence) from Pseudomonas sp. strain ADP and clone (T7) (SEQ ID NO: 1 and 
SEQ1DN0:4). 

Fig. 3. Amino acid sequence alignment of wild type AtzA (bottom 
sequence) from Pseudomonas sp. strain ADP and clone (A7) (SEQ ID N0:2 and 
SEQIDN0:5);>r ' ^ ^ ..^ 

30 • 5 ^Fig. 4. Amino acid sequent 

y -4 k Pw«todha5 5p.' strain 
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Fig. 5. Nucleotide sequence alignment of wild type atzA (SEQ ID 
^OAMttom sequence) from Pseudomonassp. strain ADP and clone (All). 
Fig. 5(a) provides the sequence from nucleic acids 1 1 -543 (SEQ ID N0:7). Fig. 
5(b) provides the sequence from nucleic acids 454-901 (SEQ ID N0:8). Fig. 5(c) 
provides the sequence from 1458-1851 (SEQ ID N0:9; N in this sequence ^ 
indicates that this nucleotide has not been verified) and Fig. 5(d) provides the 
sequence from nucleic acids 1 125-1482 (SEQ ID NO:I0) of clone Al 1. The "N» 
in these sequences refer to nucleic acids that are being verified. 

Fig. 6. Nucleotide sequence alignment of a portion of the nucleic 
acid sequence of wild type atzA from Pseudomonas sp. strain ADP and nucleic 
acids 436-963 of clone (A13) (SEQ IP NO. l 1 and SEQ ID N0:1). 

Fig. 7. is a histogram illustrating the TERBUTHYLAZINE 
degradative ability of two homologs of this invention (T7= sample 3 and A7 = 
sample 4). Fig. 7(a) illustrates the % of TERBUTHYLAZINE remaining after 
exposure to AtzA or a homolog. Fig. 7(b) illustrates the relative amount of 
hydroxyterbuthylazine as a measure of TERBUTHYLAZINE degradation. 

Fig. 8. is another set of histograms illustrating the terbutylazine 
degradative ability of three homologs A7, All. and T7. Figure 8(a) provides the 
% of TERBUTHYLAZINE remaining after a 15 minute exposure to the homolog 
in the presence or absence of the metals and additives of Samples l-IO. Figure 
8(b) provides the relative amount of hydroxterbuthylazine in the presence or 
absence of the metals and compounds of Samples 1-10. 

Fig. 9. is a comparison of PGR amplified fragments using two 
primers of the atrazine hydrochlorase gene from 6 different types of bacteria; 
Pseudomonas sp. strain ADP; Ralstonia strain M91-3; Clavibacier (Clav.); 
Agrobacterium strain J14(a); ND (an organism with no genus assigned) strlin 
38/38; and Alcaligenes strain SGI (SEQ ID NOS: 12-16). 

Petailed Desfrintinn of ff^P Tir^'fl^^n 
The present invention provides isolated and purified DNA >;? ) I 
and isolated and purified proteins, involved in the4egradation ofs-^M 
triazine-containing compounds. The proteins encoded by the genes of this .CJS 
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invention are involved in the dechlorination and/or the deamination of j-triazine- 
containing compounds. The vwld type AtzA protein can catalyze the 
dechlorination of j-triazine-containing compounds but not the deamination of 
these compounds. The dechlorination reaction occurs on i-triazine containing 
compounds that include a chlorine atom and at least one alkylamino side chain. 
Such compounds have the following general formula: 

Rl 
I 

N 

I i| 



R3-^^%^^^R2 



wherein R' = CI, R' = NR^R' (wherein R* and R* are each independently H or a 
C,.3 alkyl group), and R' = NR*R' (wherein R* and R' are each independently H 

10 or a C,.3 alkyl group), with the proviso that at least one of R' or R' is an 

alkylamino group. As used herein, an "alkylamino" group refers to an amine 
side chain with one or two alkyl groups attached to the nitrogen atom. Examples 
of such compounds include atrazine (2-chloro-4-ethylamino-6-isopropylamino- 
1 ,3,5-5-triaane), desethylatrazine (2-chloro-4-amino-6-isopropylamino-5- 

1 5 triazine), desisopropylatrazine (2-chloro-4-ethylamino-6-amino-5-triazine), and 
SIMAZINE(2-chloro-4,6-diethylamino-j-triazine). 

Triazine degradation activity is encoded by a gene that is localized 
to a 21 .5-kb £coRI fragment, and more specifically to a 1.9-kb Aval fragment, of 
the genome oi Pseudomonas sp. ADP (ADP is strain designiation for Atrazine- 

20 degrading Pseudomonas) bacterium. Specifically, these genomic fragments 

encode proteins involved in 5-triazine dechlorination. The rate of degradation of 
atrazine that results from the expression of these fragments in E. coli is 
comparable to that seen for native Pseudomonas sp. strain ADP; however, in , 
■ • contrast to what is seen with native Pseudomonas sp. stran ADP, this : 

25 degradation in E. coli is unaffected by the presence of inorganic mfrogen soirce^ 
like ammonium 'chloride. This is i)^^ advantageous for regions 
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contaminated with nitrogen-containing fertilizers or herbicides, for example. 
The expression of atrazine degradation activity in the presence of inorganic 
nitrogen compounds broadens the potential use of recombinant organisms for 
biodegradation of atrazine in soil and water. 
5 Hydroxyatrazine formation in the environment was previously 

thought to result solely from the chemical hydrolysis of atrazine (Armstrong et 
al-. EnvimiL^cUe^ 2. 683-689 (1968); deBruijn et .1., fiene, 22, 131-149 
(1984); and Nair et al.. Envirgn, Scj Trchpol . 26. 1627-1634(1992)). Previous 
reports suggest that the first step in atrazine degradation by environmental 
10 bacteria is dealkylation. Dealkylation produces a product that retains the 

chloride moiety and is likely to retain its toxicity in the enviromnent. In contrast 
to these reports. AtzA dechlorinates atrazine and produces a detoxified product 
m a one-step detoxification reaction that is amenable to exploitation in the 
remediation industry. There remains a need for atrazine-degrading enzymes with 
15 improved activity. 

^ As used herein, the gene encoding a protein capable of 

dechlorinating atrazine and originally identified in Pseudomonas sp. strain ADP 
and expressed in £. coli is referred to as "a/zA". whereas the protein that it 
encodes is referred to as "AtzA." Examples of the cloned wild type gene 
sequence and the amino acid sequence derived from the gene sequence are 
provided as SEQ ID NO:l and SEQ ID N0:2 respectively. As also used herein, 
the terms atrazine chlorohydrolase (AtzA) protein, atrazine chlorohydrolase 
enzyme, or simply atrazine chlorohydrolase. are used interchangeably, and refer 
to an atrazine chlorohydrolase enzyme involved in the degradation of atrazine 
15 and similar molecules as discussed above. 

A -homolog" of atrazine chlorohydrolase is an enzyme derived 
from the gene sequence encoding atrazine chlorohydrolase where the protein 
sequence encoded by the gene is modified by amino acid deletion, addition, 
substitution, or truncation but that nonetheless is capable of dechlorinating or . ^F . 
deaminating5-triazine containing compounds. In addition, the homolog of 
atrazine chlorohydrolase (AtzA) has a nucleic acid sequence that is dSrenf ^^11? 



20 



0 
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from the atzK sequence (SEQ ID N0:1) and produces a protein with modified 
biological properties or, as used herein, "altered enzymatic activities." These 
homologs include those with altered catalytic rates as quantitated by k^,, and K„, 
altered substrate ranges, altered activities as compared to the native sequence in 
5 aqueous solutions, altered stability in solvents, altered active temperature ranges 
or altered reaction conditions such as salt concentration, pH, improved activity in 
a soil environment, and the like, as compared with the wild-type atrazine 
chlorohyd )!ase (AtzA) protein. Thus, provided that two molecules possess 
enzymatic activity to an s-triazine-containing substrate and one molecule has the 
10 gene sequence of a/zA (SEQ ID N0:1), the other is considered a homolog of that 
sequence where 1) the gene sequence of the homolog differs from SEQ ID N0:1 
such that there is at least one amino acid change in the protein encoded by SEQ 
ID NO: 1 (i.e., SEQ ID N0:2); 2) the homolog has different enzymatic 
characteristics from the protein encoded by SEQ ID N0:1 such as, but not 
1 5 limited to, an altered substrate preference, altered rate of activity, or altered 

conditions for enzymatic activity such as temperature, pH, salt concentration or 
the like, as discussed supra\ and 3) where the coding region of the nucleic acid 
sequence encoding the variant protein has at least 95% homology to SEQ ID 
N0:1. 

20 As used herein, the terms "isolated and purified" refer to the 

isolation of a DNA molecule or protein from its natural cellular environment, 
and from association with other coding regions of the bacterial genome, so that it 
can be sequenced, replicated, and/or expressed. Preferably, the isolated and 
purified DNA molecules of the invention comprise a single coding region. Thus, 
25 the present DNA molecules are preferably those consisting of a DNA segment 
encoding a homolog of atrazine chlorohydrolase. • 
Using the nucleic acid encoding the wild-type a\zA sequence and 
the amino acid sequence of the wild-type enzyme AtzA, similar atraane 
degrading enzymes were identified in other bacteria. In fact, sequencing of the 
30 atzA gene in the other bacteria demonstrated a homology of at least 99% to theg 

' .'c'i: Sir?? 

oteA sequence, suggesting little evolutionary drift (see SEQ ID NOS:12-16), 
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Homologs of the atzA gene could not be identified in the genomes of bacteria 
that did not metabolize atrazine. This infomaation supports the theory that the 
aizA gene evolved to metabolize .-triazine-containing compounds. 

The studies assessing the prevalence and homology of the atzA 
gene in a variety of bacterial genera also suggest that atzA is likely to be a 
relatively young, i.e. recently evolved gene. That the gene is recently evolved is 
supported by the attributes of the gene and the protein encoded by the gene. For 
example: (i) the gene has a limited .-triazine range that includes atrazine and the 
structurally analogous herbicide SIMAZINE. but does not act on all .-triazines; 
(ii) the gene has a high sequence homology to genes isolated from other bacteria 
that produce proteins with atrazine-degrading activity; (iii) is not organized with 
the atzB and atzC genes in a contiguous arrangement such as an operon; (iv) the 
gene lacks the type of coordinate genetic regulation seen, for example, in 
aromatic hydrocarbon biodegradative pathway genes; (v) the wild-type gene was 
isolated from a spill site containing high atrazine levels and (vi) it is suggested to 
have been environmentally undetectable until the last few years. 

Genes involved in reactions common to most bacteria and 
mammals are more highly evolved and have attained catalytic proficiency closer 
to theoretical perfection. Genes that have evolved more recently have not had the 
evolutionary opportunity to maximize the level of catalytic efficiency that they 
could theoretically obtain. These enzymes are suboptimal. Suboptimal enzymes 
mclude enzymes that have a second order rate constant, k^, that is orders of 
magnitude below the diffusion-control !ed limit of enzyme catalysis. 3 x 10« 
M-'s-. These enzymes have the potential to evolve higher k„., lower K„, or both. 
Enzymes with higher k„.. lower K.„. or both would appear to have selective 
advantage as a biodegradative enzyme because less enzyme with higher activity 
would serve the same metabolic need and conserve ATP expended in enzyme 
biosynthesis. Optimized enzymes have the further advantage of having an 
improved commercial value resulting from their improved efficiency or 
improved activity under a defined set of conditions. : 
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Thus, the aizA gene is, potentially, an ^-triazine compound- 
degrading progenitor with the potential for improvement and modification. 
AtzA is a candidate for studies to generate homologs with improved activity, i.e., 
enhanced rate, altered pH preference, salt concentration and the like. The k,^M 
5 for atrazine chlorohydrolase purified from Pseudomonas ADP is 5 x 1 0^ M-^sr\ 3 
orders of magnitude below the theoretical catalytic limit. That all of the aizA 
homologous genes from a survey of atrazine-degrading bacteria are so 
structurally and catalytically similar suggest that the atzA gene and the AtzA 
protein can be improved and Vidll be improved naturally over time. Indeed, most 
10 biodegradative enzymes are orders of magnitude below diffusion limiting 

enzyme rates and, under this hypothesis, are also candidates for gene and protein 
modifications. 

In one embodiment of this invention, a method is disclosed for 
selecting or screening modified and improved atzA gene sequences that encode 

15 protein with improved enzymatic activity, whether the activity is enzymatic rate, 
using atrazine as a substrate, as compared to the wild-type sequence, or 
improved activity under any of a variety of reaction conditions including, but 
not limited to, elevated temperature, salt concentration, altered substrate range, 
solvent conditions, pH ranges, tolerance or stability to a variety of environmental 

20 conditions, or other reaction conditions that may be useful in bioremediation 
processes. The method preferably includes the steps of obtaining the wild-tj^e 
atzK gene sequence, mutagenizing the gene sequence to obtain altered atzk 
sequences, selecting or screening for clones expressing altered AtzA activity and 
selecting gene sequences encoding AtzA protein with improved 5-triazine- 

25 degrading activity. 

As a first step for practicing the method of this invention, the wild- 
type flf/zA sequence (SEQ ID N0:1) is incorporated into a vector or into nucleic 
acid that is suitable for a particular mutagenesis procedure. The wild type atzh 
gene was first obtained as a \.9-\ihAva\ genomic firagment that encodes an . , ' 

30 ; . enzyme that transfomis atrazine to hydro)Q^atrazine, termed atrazuie :'f^[ 
S-i:: chlorohydrolase.' Methbds for bbtairiiiig this fragment are disclosed by de Souza*;' 
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et al. (A ppl.f;nviron Ml > mK M:3373.3378. (1995)). The gene. atzA, has one 
large ORF (open reading frame) and produces a translation product of about 473 
amino acids. A particularly constant portion of this gene appears to occur at 
position 236 and end at position 1655 of SEQ ID N0:1. The wild-type atzA 
5 gene from Pseudomonas strain ADP includes 1419 nucleotides and encodes a 
polypeptide of 473 amino acids with an estimated U, of 52.42 1 and a pi of 6 6 
The gene also includes a typical Pseudomonas ribosome binding site, beginnino 
with GGAGA. located 1 1 bp upstream from the proposed start codon. A 
potential stop codon is located at position 1655. 

' '^^^'^"''■^yP^^'^A sequence can be obtained from a variety of 

sources including a DNA library, containing either genomic or plasmid DNA 

obtamed from bacteria believed to possess the a^MDNA. Alternatively the ' 
ongmal isolate identified as containing the atzA DNA is described in U.S. Pat. 
No. 5.508.193 and can be accessed as a deposit from the American Type Culture 
15 Collection (ATCC No. 55464 Rockville. Maryland). Libraries can be screened 
usmg oligonucleotide probes, for example, to identify the DNA corresponding to 
SEQ ID NO: 1 . SEQ ID NO: 1 can also be obtained by PCR (polymerase chain 
reaction) using primers selected using SEQ ID NO: 1 and the nucleic acid 
obtained from the ^/zA-containing organism (ATCC No. 55464) deposited with 
20 the American Type Culture Collection. 

Screening DNA libraries or amplifying regions from prokaiyotic 
DNA using synthetic oligonucleotides is a preferred method to obtain the wild- 
type sequence of this invention. The oligonucleotides should be of sufficient 
length and sufficiently nondegenerate to minimize false positives. In a preferred 
strategy, the actual nucleotide sequence(s) of the probe(s) is designed based on 
regions of the atzA DNA. preferably outside of the reading frame of the gene 
(the translated reading frame begins at position 236 and ends at position 1655 of 
SEQ ID NO:I) that have the least codon redundancy. 

Cloning ofthe open reading frame encoding a/zA into the . . 
appropriate r^licable vectors allows expression of the gene product, the AxzA^B. 
en^e, and maizes the coding region available fbr further genetic engineeringllis 
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The types of mutagenesis procedures that are capable of generating 
a variety of gene sequences based on a parent sequence, atzA or a previously 
mutagenized or altered sequence of atzA, are known in the art and each method 
has a preferred vector format. In general, the mutagenesis procedures selected is 
5 one that generates at least one modified alzA sequence and preferably a 

population of modified atzA gene sequences. Selecting or screening procedures 
are used to identify preferred modified enzymes (i.e., homologs) from the pool of 
modified sequences. 

There are a number of methods in use for creating mutant proteins 

10 in a library format from a parent sequence. Tliese include the polymerase chain 
reaction (Leung, D.W. et al. Technique 1:1 1-15, (1989)), Bartel, D.P. et al. 
Science 261:141 1-1418 (1993)), cassette mutagenesis (Arkin, A. et al. Proc. 
Natl. Acad. Sci. USA S9:7R1 1-7R1 5 (1992), Oliphant, A.R. et al.. Gene 44: 177- 
183 (1 986), Heimes, J.D. et al., Proc. Natl. Acad. Sci. USA £2:696-700 (1990), 

15 Delgrave et al. Protein En^ineerinp fi:327-331, (1993), Delgrave et al. 

Bio/Technology 11:1 548-1 552 (1993), and Goldman, ER et al., Bio/Technology 
10:1557-1561 (1992)), as well as methods that exploit the standard polymerase 
chain reaction, including, but not limited to, DNA recombination during in vitro 
PGR (Meyerhans, A. et aL. Nucl. Acids Res. 18:1687-1691 (1990), and Marton 

20 et al. Nucl. Acids Res. 12:2423-2426, 1 991)), in vivo site specific recombination 
(Nissim et al. EMBOJ. 13 :692-698 (1994), Winter et al. Ann. Rev. Immunol. 
12:433-55 (1994)), overlap extension and PGR (Hayashi et al. Biotechniques 
12:310-315 (1994)), applied molecular evolution systems (Bock, L. G. et al., 
Nature 255:564-566 (1992), Scott, J. K. et al.. Science 249 : 386-390 (1990), 

25 Gwirla, S.E. et al. Proc. Natl. Acad. Sci. USA £7:6378-6382 (1990), McCafferty. 
J. et al. MaJ!irgM&:552-554 (1990)), DNA shuffling systems, including those 
reported by Stemmer et al. (Nature 270:389-391 (1994) and Proc. Natl. Acad. 
Sci. fUSA'> 21:10747-10751 (1994) and International Patent Application 
Publicatipn Number WO 95/22625), and random in vivo recombination (see ... . 

30 Garen et al. Bio/Technologv 12: 433-55 (1994), Galoger et al. EEMS '''r. 
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Migrol7io|oPYT e n 22:41-44 (1992). International Patent Application Publication 
Numbers WO91/01087, to Galizzi and WO90/07576 to Radman, et al.). 

Preferably, the method produces libraries with large numbers of 
mutant nucleic acid sequences that can be easily screened or selected without 
5 undue experimentation. Those skilled in the art will recognize that screening 
and/or selection methods are well documented in the art and those of ordinary 
skill in the art will be able to use the cited methods as well as other references 
similarly describing the afore-mentioned methods to produce pools of variant 
sequences. Prefeired strategies include methods for screening for degradative 
10 activity of the ^-triazine-containing compound on nutrient plates containing the 
homolog-encoding bacteria or by use of colormetric assays to detect the release 
of chlorine ions. Preferred selection assays include methods for selecting for 
, homolog-containing bacterial growth on or in a ^-triazine containing medium. 

In a preferred method of this invention, gene shuffling, also teraied 
1 5 recursive sequence recombination, is used to generate a pool of mutated 
sequences of the atzK gene. In this method the atzh gene, alone or in 
combination with the oteB gene, is amplified, such as by PGR. or, alternatively, 
multiple copies of the gene sequence {atzk and a/zB) are isolated and purified. 
The gene sequence is cut into random fragments using enzymes known in the art. 
20 including DNAase I. The fragments are purified and the fragments are incubated 
with single or double-stranded oligonucleotides where the oligonucleoUdes 
comprise an area of identity and an area of heterology to the template gene or 
gene sequence. The rpc.lting mixture is denatured and incubated with a 
polymerase to produce annealing of the single-stranded fragments at regions of 
25 identity between the single-stranded fragments. Strand elongation results in the 
formation of a mutagenized double-stranded polynucleotide. These steps are 
repeated at least once. In this gene shuffling technique, recombination occurs 
between substantially homologous, but non-identical, sequences of the atzK 
gene. In the studies provided in Example 2, the ateB gene was not gene- 
30 shuffled. 
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In the technique, published by Stemmer et al. fNature ^ supra), the 
reassembled product is amplified by PGR and cloned into a vector. Clones 
containing the shuffled gene are next used in selection or screening assays. 
Example 2 discloses the use of a gene shuffling technique to generate pools of 
5 modified atzh. sequences. The gene shuffling technique of Example 2 was 
modified based on the Stemmer et al. references. In this technique, an entire 
plasmid containing the atzk and atzQ gene in a vector was treated with DNAase 
I and fragments between 500 and 2000 bp were gel purified. The fragments 
were assembled in a PGR reaction as provided in Example 2. 
1 0 Once intact gene sequences are reassembled, they are incorporated 

into a vector suitable for expressing protein encoded by the reassembled nucleic 
acid, or as provided in Example 1, where the gene sequences are already in a 
vector, the vector can be incorporated directly into an organism suitable for 
replicating the vector. The vector containing the atzk gene is also preferably 
1 5 incorporated into a host suitable for expressing the atzk gene. The host, 
generally an E. coli species, is used in assays to screen or select for clones 
expressing the AtzA protein under defined conditions. The type of organism can 
be matched to the mutagenesis procedure and in Example 2, a preferred 
organism was the £. coli strain NM522. 
20 The assays suitable for use in this invention can take any of a ^ 

variety of forms for determining whether a particular protein produced by the 
organism containing the variant atzk sequences expresses an enzyme capable of 
dechlorinating or deaminating 5-tria2ine compounds. Therefore, the types of 
assays that could be used in this invention include assays that monitor the 
25 degradation of 5-triazine-containing compounds including ATRAZINE, 

SIMAZINE or MEL AMINE using any of a variety of methods including, but not 
limited to, HPLC analysis to assess substrate degradation; monitoring clearing of 
precipitable 5-triazine containing substrates, such as atrazine or 
TERBUTHYLAZINE, on solid media by bacteria containing the homoldgs of 
30 this invention; growth assays in media containing soluble substrate^ monitoririg 
the amount of chlorine released, as described by Bergihan^et al.rAp^l.'-Cl: 
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22, 241-243 (1957) or the amount of nitrogen released; evaluating the derivitized 
product using gas chromatography and/or mass spectroscopy, solid agar plate 
assays with varied salt. pH substrate, solvent, or temperature conditions, 
colorimetric assays such as those provided by Epstein. J. ("Estimation of 
Microquantitation of Cyanide", {mi) Analytical Chemistry 19(4). -2 72-2 76) and 
Habig and Jakoby ("Assays for Differentiation of Glutathione s-transferases. 
Methods in Enzymology 77:398-405) as welJ as radiolabelled assays to assess, 
for example, the release of radiolabel as a result of enzymatic activity. 

In a preferred assay, clones are tested for their ability to degrade s- 
triazine-containing compounds such as atrazine, SIMAZINE. 
TERBUTHYLAZINE (2-chloro-4-(ethylamino)-6-(tertiary butyl-amino)- 1,3.5- 
triazine). desethylatrazine. desisopropylatrazine, MELAMINE. and the like. In 
these assays, atrazine, or another insoluble j-triazine-containing substrate, is 
incorporated into a nutrient agar plate as the sole nitrogen source. 
Concentrations of atrazine or other j-triazine-containing compounds can vary in 
the plate from about 300 pg/ml to at least about 1000 ^g/ml and in a preferred 
embodiment about 500 ng/ml atrazine is used on the plate. Many .-triazines are 
relatively insoluble compounds in water and a suspension in an agar plate 
produces a cloudy apj. earance. Bacteria capable of metabolizing the insoluble ^- 
triazine-containing compounds produce a clearing on the cloudy agar plate. An 
exemplary assays is a modified assay disclosed by Mandelbaum et al. (AppL 
gnviron. Mirrohiol 61:1451-1453. (1995)) and provided in Example 2. In these 
assays LB medium can be used with the atrczine because E. coli expressing 
AtzA homologs support atrazine-degrading activity in the presence of other 
nitrogen sources. The assay demonstrates atrazine degradation by observing 
clearing zones surrounding clones expressing homologs of AtzA. 

Clones are selected from the insoluble substrate assay based on 
their ability to produce, for example, a clearing in the substrate-containing plates. 
Similarly, assay conditions can be modified such as. but not limited to, salt. pH, ^ ; 
solvent, temperature, and the like, to select clones encodmg AtzA homologs 'VJ: 
capable of degrading a substrate under a variety of test whditioris. For exampg 
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the pH of the assay can be altered to a pH range of about 5 to about 9. These 
assays would likely use isolated homolog protein to permit an accurate 
assessment of the effect of pH. The assay, or a modification of the assay, 
suitable for elevated temperatures (such as a soluble assay) can employ elevated 
5 temperature ranges, for example, between about 50"* to about 80°C. The assays 
can also be modified to include altered salt concentrations including conditions 
equivalent to salt concentrations of about 2% to at least about 5% and preferably 
less than about 10% NaCl. 

Clones identified as having altered enzymatic activity as compared 
1 0 with the native en2yme are further assessed to rule out if the apparent enhanced 
activity of the enzyme is the result of a faster or more efficient AtzA protein 
production or whether the effect observed is the result of an altered atzk gene 
sequence. For example, in Example 2, the atzA was expressed to a high level 
using pUCl 8 as a preferred method to rule out higher in vivo activity due to 
1 5 increased expression. 

Once triazine-degrading colonies are identified with the desired 
characteristics, the AtzA homologs are isolated for further analysis. Clones 
containing putative faster enzyme(s) can be picked, grown in liquid culture, and 
the protein homolog can be purified, for example, as described (de Souza et al. 
20 gacteriQlogy, rZS:4894.4900 (1 996)). The genes encoding the homologs can be 
modified, as know.; in the art, for extracellular expression or the homologs can 
be purified from bacteria. An exemplary m^^thod for protein purification is 
provided in Example 4. In a preferred method, protein was collected from 
bacteria using ammonium sulfate precipitation and further purified by HPLC 
25 (see for example, de Souza et al., App. Envir. Microbio. ^:3373-3378 (1995)). 

Using these methods, a number of homologs were identified. 
Homologs can be identified using the assays discussed in association with this 
invention including the precipitable substrate assays on solid agar as described : 
by Mandelbaum, et al. (supra). Homologs identified using the methods of, l:^^.;. 
30 Example 2 were separately screened for atrazine-degrading activity, for 

^ . eiAanced TERBUTHYLAZINE-degrading activity and for activity agamit oliieS 
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^-triazine-containing compounds. An assay for TERBUTHYLAZINE degrading 
activity is provided in Example 6. Two homologs were found to have at least a ' 
10 fold higher activity and contained 8 different amino acids than the native 
AtzA protein (A7 and T7. see Figs. 1-4). A subsequent round of DNA shuffling 
5 starting with the homolog gene sequence yielded further improvements in 
activity (Al I and A13 corresponding to nucleic acid SEQ ID NOS: 7-10 and 
SEQ ID N0:1 1 respectively). This enzyme and other AtzA homologs (clones 
A40, A42, A44. A46. A60 corresponding to nucleic acid SEQ ID NOS: 1 7-21 
and to protein SEQ ID NOS: 22-26. respectively) represent catabolic enzymes 
1 0 modified in their biological activity. Preferred homologs identified in initial 
studies include A7, T7, All, A44, and A46. 

Homologs were also identified with altered substrate activity. Both 
homologs T7 and A7 were able to degrade TERBUTHYLAZINE better than the 
wild-type enzyme. Other homologs capable of degrading TERBUTHYLAZINE 
15 include A42,A44.A46 and A60. 

Atrazine chlorohydrolase converts a herbicide to a non-toxic, 
non-herbicidal. more highly biodegradable compound and the kinetic 
improvement of the homologs has important implications for enzymaUc 
enviromnental remediation of this widely used herbicide. Less protein is 
20 required to dechlorinate the same amount of atrazine. Importantly, the protein 
can also be used for degradation of the 5-triazine-compound 
TERBUTHYLAZINE. 

This invention also relates to nucleic acid and protein sequences 
identified from the homologs of this invention. Peptide and nucleic acid 
25 fragments of these sequences are also contemplated and those skilled in the art 
can readily prepare peptide fragments, oligonucleotides, probes and other nucleic 
acid fragments based on the sequences of this mvention. The homologs of this 
invention include those with an activity different from the native atrazine 
chlorohydrolase (AtzA) protein. As noted supra, an activity that is difierent ^ .: 
from the native atrazine chlorohydrolase protein includes enzymatic activity thafS 
is improved or is capable of fimctionmg under different conditions sich^^s salt ill 
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concentration, temperature, altered substrate, or the like. Preferably, the DNA 
encoding the homologs hybridize to a DNA molecule complementary to the 
wild-type coding region of a DNA molecule encoding wild-type AtzA protein, 
such as the sequence provided in SEQ ID N0:1, under high to moderate 
5 stringency hybridization conditions. The homologs preferably have a homology 
of at least 95% to SEQ ID NO: 1 . As used herein, "high stringency hybridization 
conditions" refers to, for example, hybridization conditions in buffer containing 
0.25 M NajHPO^ (pH 7.4), 7% sodium dodecy! .alfate (SDS), 1% bovine serum 
albumin (BSA), 1.0 mM ethylene diamine tetraacetic acid (EDTA, pH 8) at 
10 65*^0, followed by washing 3x with 0.1% SDS and O.lx SSC (O.lx SSC contains 
0.015 M sodium chloride and 0.0015 M trisodium citrate, pH 7.0) at 65 ^^C. 

A number of homologs have been identified using the methods of 
this invention. For example, SEQ ID N0:3 is the gene sequence of a homolog 
A7 of the atzk gene that shows enhanced atrazine degradation activity and, 
1 5 surprisingly, also demonstrated enhanced TERBUTHYLAZINE degradation 
activity. TERBUTHYLAZINE degradation experiments are provided in 
Example 6. The amino acid sequence of the enzyme encoded by SEQ ID N0:3 
identified as SEQ ID N0:5. SEQ ID NO: 4 is the gene sequence of the homolog 
T7 of the atzK gene that shows enhanced atrazine degradation activity and 
20 enhanced TERBUTHYLAZINE degradation activity. A summary of the 
TERBUTHYLAZINE degradation activity for T7 and A7 is provided in 
Example 6. SEQ ID N0:6 provides the amino acid sequence of the homolog 
encoded by SEQ ID N0:4. Fig. 1 provides the nucleotide sequence aligrmient of 
wild type atzk from SEQ ID N0:1 with SEQ ID N0:3 and Fig.2 provides the 
25 nucleotide sequence alignment of SEQ ID NO: 1 with SEQ ID N0:4. Fig. 3 
provides the amino acid sequence alignment of SEQ ID N0:2, the amino acid 
sequence of the protein encoded by SEQ ID N0:1, with SEQ ID N0:5 and Fig. 4 
provides the amino acid sequence aligrunent of SEQ ID N0:2 with SEQ ID 
N0:6. A review of the sequences encoding A7 and T7 indicate that both 
30 homologs have a total of 8 amino acid changes relative to native AtzA (SEQ ID " 
N0:2). Seven amino acid changes are common to both A7 and T7. The riucleic; ? 
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acid sequences of other homologs with altered activity include A40 (nucleic acid 
SEQ ID N0:17; amino acid sequence SEQ ID NO:22); A42 (nucleic acid SEQ 
ID N0:18: amino acid sequence SEQ ID NO;23); A44 (nucleic acid SEQ ID 
NO: 19; amino acid sequence SEQ ID NO:24); A46 (nucleic acid SEQ ID 
NO:20; amino acid sequence SEQ ID NO:25); and A60 (nucleic acid SEQ ID 
N0:21 ; amino acid sequence SEQ ID NO:26). 

Without intending to limit the scope of this invemion, the success 
attributed to the idemification of homologs of AtzA may be based on the 
recognition that this protein is not evolutionarily mature. Therefore, not all gene 
sequences are good candidates as the starting material for identifying a number 
of biological variants of a particular protein and similarly, not all enzymes are 
amenable to the order of magnitude of rate enhancement by directed evolution 
using DNA shuffling or other methods. Without intending to limit the scope of 
this invention, it is believed that some enzymes are already processing substrates 
at their theoretical rate limit. In these cases, catalysis is limited by the physical 
diffusion of the substrate onto the catalytic surface of the enzyme. Thus, 
changes in the enzyme would not likely improve the rate of catalysis. Examples 
of enzymes that operate at or near catalytic "perfection" are triosephosphate 
isomerase, ftimarase. and crotonase (available from the GenBank database 
system). Even biodegradative enzymes that hydrolyze toxic substrates fall into 
this class. For example, the phosphotriesterase that hydrolyzes paraoxon 
operates near enough to the diffiasion limit and suggests that it would not be a 
good candidate for muta-nic methods to improve the catalytic rate constant of 
the enzyme with its substrate (see Caldwell et al., Biochem 20:7438-7444 
25 (1991)). 

The gene sequences of this invention can be incorporated into a 
variety of vectors. Preferably, the vector includes a region encoding a homolog 
of AtzA and the vector can also include other DNA segments operably linked to 
the coding sequence in an expression cassette, as required for expression of the • v 
30 , r homologs. such as a promoter region operably linked to the 5' end of the coding^:" 
; DNA sequence, a selectable marker gene, a reporter gene, and the like. • rfS^; 
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The present invention also provides recombinant cells expressing 
the homologs of this invention. For example, DNA that expresses the homoiogs 
of this invention can be expressed in a variety of bacterial strains including £ 
coli sp. strains and Pseudomonas sp. strains. Other organisms include, but are 
not limited to, Rhizobium, Bacillus, Bradyrhizobium, Arthrobacter, Alcaligenes, 
and other rhizosphere and nonrhizosphere soil microbe strains. 

In addition to prokaryotes, eukaryotic microbes such as filamentous 
fungi or yeast are suitable hosts for vectors encoding atzA or its homologs. 
Saccharomyces cerevisiae, or common baker's yeast, is the most commonly used 
among lower eukaryotic host microorganisms. However, a number of other 
genera, species, and strains are conmionly available and useful herein, such as 
Schizosaccaromyces pombe, Kluyveromyces hosts such as, e.g,,K lactis, K 
fragilis, K bulgaricus, K thermotolerans, and K. marxianus, Pichia pastoris, 
Candida, Trichoderma reesia, Neurospora crassa, and filamentous fungi such 
as, e.g., Neurospora, Penicilliiim, Tolypocladium, and Aspergillus hosts such as 
A, nidulans. 

Prokaryotic cells used to produce the homologs of this invention 
are cultured in suitable media, as described generally in Maniatis et al, 
Molecular Cloning: A La boratory Manual : Cold Spring Harbor Press: Cold 
Spring Harbor, NY (1989). Any necessary supplements may also be included at 
appropriate concentrations that would be known to those skilled in the art. In 
general the E. coli expressing the homologs of this invention are readily cultured 
in LB media (see Maniatis. supra). The culture conditions, such as temperature, 
pH, and the like, are those previously used with the host cell selected for 
expression, and will be apparent to those skilled in the art. Induction of cells to 
express the AtzA protein is accomplished using the procedures required by the 
particular expression system selected. The host cells referred to in this 
disclosure are generally cultured in vitro. Cells are harvested, and cell extracts 
are prepared, using standard laboratory protocols. 

This invention also relates to isolated proteins that are the product 
of the gene isequerices of this invention: The isolated proteins are protein / . 
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homologs of the wild-type atrazine chlorohydrolase enzyme despite their 
potential for altered substrate preference. The protein can be isolated in a variety 
of methods disclosed in the art and a preferred method for isolating the protein is 
provided in Examples 4 and 5 and in the publications of de Souza et al. {supra). 

The wild-type AtzA protein acts on Atrazine, desethylatrazine, 
Desisopropylatrazine and SIMAZINE but did not degrade 
Desethyldesisopropylatrazine or MELAMINE and only poorly degraded 
TERBUTHYLAZINE. Homologs identified in this invention have a spectrum of 
substrate preferences identical to the wild-type AtzA protein and in addition, for 
example, are able to degrade other substrates such as TERBUTHYLAZINE. 
That homologs were identified that were capable of degrading two different s- 
triazi.ne-containing compounds suggests that the methods of this invention can be 
used on the wild-type progenitor atzh gene or on the homologs produced by this 
invention to produce even more useful proteins for environmental remediation of 
j-triazine-containing compounds. Example 7 provides an assay for detecting 
, degradation, including deamination, of a soluble j-triazine-containing 
\ compound. 

Various environmental remediation techniques are known that 
utilize high levels of proteins. Bacteria or other hosts expressing the homologs 
20 of this invention can be added to a remediation mix or mixture in need of 

remediation to promote contaminate degradation. Alternatively, isolated AtzA 
homologs can be added. Proteins can be bound to immobilization supports, such 
as beads, particles, film:, etc., made from latex, polymers, alginate, polyurethane, 
plastic, glass, polystyrene, and other natural and man-made support materials. 
Such immobilized protein can be used in packed-bed columns for treating water 
effluents. The protein can be used to remediate liquid samples, such as 
contaminated water, or solids. The advantage of some of the homologs identified 
thus far indicate that the homologs demonstrate an ability to degrade more than 
one substrate and to degrade the substrate at a faster rate or under different 
30 reaction conditions from Ae native enzyme. , , ,-,..,,,.: .. -!■ ' Or ^ 

. Alj.^^^^F????^^ and pub^^^^^ 
incorporated by reference into this disclosure. The invention will be further 
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described by reference to the following detailed examples. Particular 
embodiments of this invention will be discussed in detail and reference has been 
made to possible variations within the scope of this invention. There are a 
variety of alternative techniques and procedures available to those of skill in the 
art which would similarly permit one to successfully perform the intended 
invention that do not detract from the spirit and scope of this invention. 



Example 1 

Isolation of Wild-type atzA gene from Pseudomonas sp. strain ADP 
Bacterial strains and growth conditions. 



Pseudomonas sp. strain ADP (Mandelbaum et al., AddI. Environ 
^^'^fQ'^'Qi-. 52, 1695-1701 (1993)) was grown at on modified minimal salt 
buffer medium, containing 0.5% (wt/vol) sodium citrate dihydrate. The atrazine 
1 5 stock solution was prepared as described in Mandelbaum et al., Appl. Environ 
Microbiol, . 61, 1451-1457 (1995)). Escherichia coliTMSa. was grown in Luria- 
Bertani (LB) or M63 minimal mjsd^^, which are described in Maniatis et al.. 
Molecular Cloning: A T.ahV>r at6rv ivrarilai ! TnlH Spring Harbor Press: Cold 
Spring Harbor, NY (1989). Tetracycline (15 pg/ml), kanamycin (20 pg/ml). and 
20 chloramphenicol (30 ng/ml) were added as required. 

To construct the Pseudomonas sp. strain ADP genomic library, 
total genomic DNA was partially digested vwth £coRl, ligated to the JE:coRI- 
digested cosmid vector pLAFR3 DNA, and packaged in vitro. The completed 
genomic DNA library contamed 2000 colonies. 
25 To identify the atrazine degrading clones, the entire gene library 

was replica-plated onto LB medium containing 500 pg/ml atrazine and 15 ng/ml 
tetracycline. Fourteen colonies having clearing zones were identified. All 
fourteen clones degraded atrazine, as determined by HPLC analysis. Cosmid 
DNA isolated from the fourteen colonies contained cloned DNA firagments 
30 which were approximately 22 kb in length. The fourteen clones could be 

subdivided into six groups on the basis of restriction enzyme digestion analysis 
using £coRI. All fourteen clones, however, contained the same 8.7 kb EcoVl 
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fragment. Thirteen of the colonies, in addition to degrading atrazine. also 
produced an opaque material that suirounded colonies growing on agar medium 
Subsequent experiments indicated that the opaque material only was observed in 
E. coli clones which accumulated hydroxyatrazine. Thus, the cloudy material 
surrounding £ coli pMD2-pMD4 colonies was due to the deposition of 
hydroxyatrazine in the growth medium. The one colony that degraded atrazine 
without the deposition of the opaque material was selected for further analysis. 
The clone from this colony was designated pMDl. 

Example 2 
Mutagenesis Procedure 

Gene Shuffling. Atz A and B genes were subcloned from pMDl 
into pUCl 8. The two inserts were reduced in size to remove extraneous DNA. 
A 1 .9 kb Aval fragmem containing atzA was end-filled and cloned into the end- 
filled Aval site of pUC18. A 3.9 kb Clal fragment containing atzB was end- 
filled and cloned into the Hindi site of pUC18. The gene atzA was then excised 
from pUCl 8 with EcoRI and BamHI. AtzB with BamHI and Hindlll. and the two 
mserts were co-ligated into pUC18 digested with EcoRJ and HindHI. The result 
was a 5.8 kb insert containing AtzA and AtzB in pUCl 8 (total plasmid size 8 4 
kb). 

Recursive sequence recombination was performed by modifications 
of existing procedures (Stemmer. W.. Proc.Natl An.H Q.j ttca q,.,^^^. 
10751 (1 994) and Stemmer. W.HamiS 220:389-391 (1994)). (Mervyn, do you 
know more now about what was done?] The entire 8.4 kb plasmid was treated 
with DNAase I in 50 mM Tris-Cl pH 7.5, 10 mM MnCl, and fragments between 
500 and 2000 bp were gel purified. The fragments were assembled in a PGR 
reaction using Tth-XL enzyme and buffer from Perkin Elmer. 2.5 mM MgOAc. 
400 m dNTPs and serial dilutions of DNA fragments. n.e assembly reaction' 
was perfomied in an MJ Research "DNA Engine" theimocycler programmed , 
with the following cycles: ^ 
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1 94X, 20 seconds 

2 94 *C J 5 seconds 

3 40*^0, 30 seconds 

4 IV C, 30 seconds + 2 seconds per cycle 
5 5 go to step 2 39 more times 

6 4T 

The atzh gene could not be amplified from the assembly reaction 
using the polymerase chain reaction, so instead DNA from the reaction was 
purified by standard phenol extraction and ethanol precipitation methods and 

1 0 digested with Kpnl to linearize the plasmid (the Kpnl site in pUCl 8 was lost 
during subcloning, leaving only the Kpnl site in atzA), Linearized plasmid was 
gel-purified, self-ligated overnight and transformed into E coli strain NM522. 

Serial dilutions of the transformation reaction were plated onto LB 
plates containing 50 jig/ml ampicillin, the remainder of the transformation was 

1 5 stored in 25% glycerol and frozen at -80**C. Once the transformed cells were 
titered, the frozen cells were plated at a density of between 200 and 500 on 150 
mm diameter plates containing 500 ng/ml atrazine or another substrate and 
grown atST'^C. 

Atrazine at 500 |ig/ml forms an insoluble precipitate creating a 
20 cloudy appearance on the agar plate. The solubility of atrazine is about 30 
|ig/ml, therefore for precipitable substrate assays, such as the assay disclosed 
here, the atrazine concentration should be preferably greater than 30 pg/ml. 
Atrazine or hydroxyatrazine were incorporated in solid LB or minimal medium, 
as described in Mandelbp'^m et al., Appl. Environ. Microbiol. . 1451-1457 
25 (1 995), at a final concentration of 500 |ag/ml to produce an opaque suspension of 
small particles in the clear agar. AtzA and the homologs with atrazine-degrading 
activity convert atrazine into a soluble product. The degradation of atrazine or 
hydroxyatrazine by wild-type and recombinant bacteria was indicated by a zone 
of clearing surrounding colonies. The more active the homolog, the more 
30 rapidly a clear halo formed on atrazine-containing plates. Positive colonies that 
most rapidly formed the largest clear zones were selected initially for further . 
analysis. The (approximately) 40 best colonies were picked, pooled, grown in 
tfie presence of 50 |ig/ml ampicillin and plasmid prepared fi-om them. More 
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efficient enzymes can also be tested using atrazine concentrations greater than 
500 ng/ml. 

The entire process (from DNAase-treatment to plating on atrazine 
plates) was repeated 4 times as a method for further improving on tlie rate of 
5 enzymatic activity. In several experiments, cells were plated on plates 

containing 500 pg/ml atrazine and on plates containing 500 pg/ml of the atrazine 
analogue TERBUTHYLAZINE. 

Other compounds can be tested in similar assays replacing atrazine 
(2-chloro-4-ethlyamino-6-isopropylamino-l,3,5-j-triazine) for the following 
10 compounds: desethylatrazine (2-chloro-4-amino-6-isopropylamino-5-triazine). 

deisopropylatrazine(2-chloro-4-ethylamino-6-amino-5-triazine),hydroxyatrazine 
(2-hydroxy-4-ethylamino-6-isopropylamino-j-triazine). desethylhydroxyatrazine 
(2-hydroxy-4-amino-6-isopropylamino-j-triazine).desisopropylhydroxyatrazme 
(2-hydroxy-4-amino-6-isopropylamino-,y-triazine). desethyldesisopropylatrazine 
1 5 (2-chloro-4,6-diamino-j-triazine), SIMAZINE (2-chloro-4.6-diethylamino.j- 
triazine), TERBUTHYLAZINE (2-chloro-4-ethylamino-6.terbutyIamino-j- 
triazine, and MELAMINE (2,4,6-triamino-j-triazine) were obtained from Ciba 
Geigy Corp., Greensboro, N.C. Ammelide (2,4-dihydroxy-6-amino-j-triazine), 
ammeline (2-hydroxy.4,6,-diamino-5-triazine) were obtained from Aldrich 
20 Chemical Co., Milwaukee, WI. 



Example 3 

DNA Sequencing of Wild-Type atzA and Homolog atzA genes 

DNA Sequencing. The nucleotide sequence of the approximately 
1.9-kb .4val DNA fragment in vector pACYCl 84, designated pMD4, or the 
homologs in pUCl 8 or another vector was determined using both DNA strands. 
DNA was sequenced by using a PRISM Ready Reaction DyeDeoxy Terminator 
Cycle Sequencing kit (Perkin-Elmer Corp.. Norwalk, CT) and a ABI Model 
373A DNA Sequencer (Applied Biosystems, Foster City, CA). Nucleotide 
sequence was determined initially by subcloning and subsequently by using 
primers designed based on sequence information obtained from subcloned DNA - ■ V 
fragments. The GCG sequence analysis software package (Genetics Computer .. 
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Group, Inc., Madison, WI) was used for all DNA and protein sequence 
comparisons. Radiolabelled chemicals were obtained from Ciba Geigy Corp., 
Greensboro, N.C. 

5 Example 4 

Protein Purification of AtzA or Homologs 

E. colt transformed with a vector containing the wild type aizA 

gene or alternatively with a homolog, in a vector capable of directing expression 

of the gene as a protein, was grown overnight at 3TC in eight liters of LB 

1 0 medium containing 25 pg/ml chloramphenicol. The culture medium was 

centrifuged at 10,000 x g for 10 minutes at 4''C, washed in 0.85% NaCl, and the 

cell pellet was resuspended in 50 ml of 25 mM MOPS buffer (3-[N- 

mofpholino]propane-sulfonic acid, pH 6.9), containing 

phenylmethylsulfonylfluoride (100 \ig/m\). The cells were broken by three 

1 5 passages through an Amicon French Pressure Cell at 20,000 pounds per square 

inch (psi) at 4'C. Cell-free extract was obtained by centrifugation at 10,000 x g 

for 15 minutes. The supernatant was clarified by centrifugation at 18,000 x g for 

60 minutes and solid NH4SO4 was added, with stirring, to a final concentration of 

20% (wt/vol) at 4''C. The solution was stirred for 30 minutes at 4''C and 

20 centrifuged at 12,000 x g for 20 minutes. The precipitated material was 

resuspended in 50 ml of 25 mM MOPS buffer (pH 6.9), and dialyzed overnight 

at 4"'C against 1 liter of 25 mM MOPS buffer (pH 6.9). 

Where purified protein was desired, the solution was loaded onto a 

Mono Q HR 16/10 Column (Pharmacia LKB Biotechnology, Uppsala, Sweden). 

25 The column was washed with 25 mM MOPS buffer (pH 6.9), and the protein 

was eluted with a 0-0.5 M KCl gradient. Protein eluting from the column was 

monitored at 280 nm by using a Pharmacia U.V. protein detector. Pooled 

fiiactions containing the major peak were dialyzed overnight against 1 liter 25 

mM MOPS buffer (pH 6.9). The dialyzed material was assayed for atrazine 

30 degradation ability by using HPLC analysis (see above) and analyzed for purity 

by sodium dodecyl sulfate (SDS) polyacrylamide gel electrophoreses ^emlli). 
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Protein Verification: Protein subunit sizes were determined by 
SDS polyacrylamide gel electrophoresis by comparison to known standard 
protems. using a Mini-Protean II gel appa^tus (Biorad. Hercules, CA) The size 
of the holoenzyme was determined by gel filtration chromatography on a 
Superose 6 HR (1.0 x 30.0 cm) column, using an FPLC System (Phannacia 
Uppsala, Sweden). The protein was eluted with 25 n^^ MOPS buffer (pH 6 9) 
containing 0. 1 M NaCl. Proteins with Wn molecular weights were used as 
chromatography standards. Isoelectric point determinations were done using a 
Phannacia Phast-Gei System and Phannacia lEF 3-9 media. A Phannacia broad- 
range pi calibration kit was used for standards. 

Enzyme Kinetics. Purified AtzA protein and homologs of the 
protem at 50 Mg/ml. were separately added to 500 ,1 of different concentrations 
of atrazine (23.3 ^M, 43.0 ^M. 93 ^M. 233 .M, and 435 ,M in 25 mM MOPS 
buffer. pH 6.9) or another .-triazine-containing compound and reactions were 
allowed to proceed at room temperature for 2, 5. 7. and 10 minutes. The 
reactions were stopped by boiling the reaction tubes at specific times the 
addition of 500 m acetonitrile and rapid freezing at -80«C. Thawed samples were 
centrifuged at 14.000 rpm for 10 minutes, the supematants were filtered through 
a 0.2 m filter, and placed into crimp-seal HPLC vials. HPLC analysis was done 
as described above. Based on HPLC data, initial rates of atrazine degradation and 
hydroxyatrazine fonnation were calculated and Michaelis Menton and 
Lineweaver Burke plots were constnicted. 

Effect of simple nitrogen sources on atrazine degradation 
From experiments done with Pseudomonas species strain ADP on solid media 
with 500 ppm atrazine and varying concentrations of ammonium chloride 
ammonium chloride concentrations as low as 0.6-1.2 mM were sufficient 'to 
inhibit visible clearing on the plates, even after 2 weeks of incubation either at 
28 Cor37''C. Withsimilar experiments using £. co// DH5a (pMDl orpMD2) 
andother£. co// strains, atrazine degradation was observed in the presence of ■ ' . 
ammonium chloride concentrations as high as48mM. TOs value is almost 40- 
80 fold higher than the wild-type tolerance for ammonium chloride with 
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concomitant atrazine degradation. Therefore, it was not necessar>' to use media 
free of exogenous ammonia in the screening assays. 

Example 5 

5 Further characterization of the enzymatic activity of the homologs 

Analysis of atrazine metabolism by E. coli clones. The extent 
and rate of atrazine degradation was determined in liquid culture. E. coli clones 
containing plasmids capable of expressing t:.^ homologs were compared to 
1 0 Pseudomonas sp. strain ADP for their ability to transform ring-labelled 
["C]-atrazine to water-soluble metabolites. This method, which measures 
['"CJ-label partitioning between organic and aqueous phases, had previously 
been used with Pseudomonas sp. ADP to show the transformation of atrazine to 
metabolites that partition into the aqueous phase, in Mandelbaum et al., AbeL 
15 Fnviron. Microbiol. . 61. 1451-1457(1995). men Pseudomonas sp. stramADP 
or £ CO// capable ofexpressing the homologs ofthis invention were incubated 
for 2 hours with ['^Cj-atrazine, 98%, 97%. 88%, and 92%, respectively, of the 
total recoverable radioactivity was found in the aqueous phase. Greater than 
90% of the initial radioactivity was accounted for as atrazine plus water soluble 
20 metabolites, indicating that little or no '^CO^ was formed. In contrast, forty-four 
percent of the radioactivity was lost firom the Pseudomonas ADP culture after 
18.5 hours. In previous studies done with Pseudomonas sp. strain ADP and ring- 
labelled '*C-atrazine, radiolabel was lost from culture filtrates as "CO^ (see, e.g., 
Mandelbaum et al., Appl. Environ. Microbiol. 61. 1451-1457 (1995)). 
25 Retention of the radiolabel is indicative of lack or inhibition of enzymatic 

activity. While tiiese studies were performed for AtzA, similar studies are used 
to assess the activity of the homologs of this invention. 

Example 6 

30 Assays to detect homologs of AtzA on TERBUTHYLAZINE 

TERBUTHYLAZINE was incorporated in solid LB medium at a 
final concentration of about 400-500 ng/ml to produce an opaque suspension of 
sample particles in the clear agar. The degradation of terbuthyalazine by ,;; 
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-ombinan, baceria was indicated by a ^„e of clearing su,ro»di„g fte 
colo„,«. HPLC analysis was perfonned with a Hewlel, Packard HP 1090 
I..,u,d Ci.„™a,„graph system quipped ™,h a photodiodc a.ay detccor and 
■mcrfaced ,0 an HP 79994A Chen,sta,io„. TERBUWYLAZmE and i.s 
metabolites wete resolved by using an analytical .vetse-phase Nova-Pak 
HPLC column (4.^m.diameter spherical packing. 150 by 3.9 mm; Waters 
Chromatog^phy. Milford, Mass., and an acetonitrile (ACN) gradient, in water 
ataflowrateofLOmlmin-. Linear gradients o.' Dto5min, 10to25% ACN-'s 

o2m,.25t„a5%ACN.2lt„23min,«.o,00%ACN;a„d23,„25min,' 
.00/. ACN were used. Spectral data of the column eluent were acquired 
between 200 and 400 mn (U-mn bandwidth per chamtel, at a sampling 
frequency of 640 ms. Spectra we,, referenced against a signal of 500 «n. 

Comparative results of an assay to assess TERBUTHYLAZINE 
degradaUon is provided in Figures 7 and 8 Figure 7 (a, provides a histogram 

demonsUatingtherelativepercenugeofreRBUTHYLAZINEremainingin 
-mples tested while Figure 7(b) provides a measure of .he production of 

hydroxyterbuthyWasameasureofTORBUTHYLAZINE degradation 
Sample 1 is a control sample without enzyme. Sample 2 uses a two fold excess 
of A«A protein as compared to the concentration of homolog added in Sample 3 
and Sample 4. Sample 3 employed the T7 homolog (SEQ ID N0:6) and Sample 
4en,ployed the A7 homolog (SEQ ID N0:5). Results wete detennined by 
HPLC as described above. Figure 8(a) ptovides the petcentage of 
TERBUTHYLAZINE remaining after a ,5 minute exposure to homologs A7 
All,andT7. Samples l-lOrefertotheeffectofhomologactivityinthe 
presence of 50 uM of: Manganese (I); Mangnesium (2). EDTA (3); cobalt (4)- 
-nc (5); ,ron (6); copper (7); nickel (8); no metal (9); or no eznyme (10). Fig™ 
m provides the relaUve amount of hydroxyterbu.hyia.ine as a measure of 
™HYLAZINEdegradationforhomologs A7(soUdbar), All (hatched - 
bar), or T7 (open bar) in the presence or absence of additives UlOisupra) : 
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Example 7 

Assays to detect homologs of AtzA on "MELAMINE" 

5 "MELAMINE" (2, 4, 6-triamino -^-triazine) at a concentration of at 

least about 1 mM to about 5 mM and preferably about 2 mM MELAMINE is 
incorporated into solid minimal nutrient media as the sole nitrogen source. 
Bacteria are distributed on the plate and growth of the organisms is indicative of 
their ability to degrade MELAMINE, thereby releasing ammonia for growth. 

10 Growth is evidence of the ability of the organisms expressing the homologs of 
this invention to deaminate MELAMINE. There is more than one nitrogen- 
containing group in MELAMINE. Therefore the selection of larger colonies on 
MELAMINE containing solid minimal nutrient media could be used to select for 
faster MELAMINE-degrading homologs. 

15 A comparison of the nucleic acid sequence from a wild type 

MELAMINE degrading Pseudomonas NRRLB 12227 strain as compared to the 
atzA gene sequence indicated a homology of more than 90% over a 500 base 
pair sequence obtained from NRRLB using primer selected that were internal to 
atzA suggesting that homologs of atzA could be identified that degrade 

20 "MELAMINE." This strain did not degrade atrazine. Moreover, homologs 

identified using the methods of Example 2 are subjected to further mutagenesis 
and colonies capable of growing in MELAMINE can be identified. Colonies 
containing the protein AtzA are tested for growth in MELAMINE under 
identical conditions. Other ^-triazine containing compounds such as the 

25 pesticides available under the tradenames "AMETRYN", "PROMETRYN", 

"PROMETRON", " ATRATON" and "CYROMAZINE" could also function as 
substrates for other homologs of this invention. 

It will be appreciated by those skilled in the art that while the 
30 invention has been described above in connection with particular embodiments 

and examples, the invention is not necessarily so limited and that numerous other" 
embodunents, examples, uses, modifications and departures from the'- • ■•H^;^^^^; 
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embodiments, example and „.« ma, be made „iU,ou, departing flo. «,e 
mventive scope of tliis applicaiion. 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION: 

(i) APPLICANT: REGENTS OF THE UNIVERSITY OF MINNESOTA 
(ii) TITLE OF INVENTION: DNA MOLECULES AND PROTEIN DISPLAYING 
IMPROVED TRIA2INE COMPOUND DEGRADING ABILITY 
(iii) NUMBER OF SEQUENCES: 26 
(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: MUETING, RAASCH & GEBHARDT, P. A. 

(B) STREET: 119 North Fourth Street 

(C) CITY: Minneapolis 
ID) STATE: Minnesota 

(E) COUNTRY: USA 

(F) ZIP: 55401 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) S0FTV7ARE: Patentin Release #1.0, Version #1.30 
(vi) PRIORITY APPLICATION DATA: 

(A) APPLICATION NUMBER: 60/035,404 

(B) FILING DATE: 17-JAN-1997 

(C) CLASSIFICATION: 

(vii) CURRENT APPLICATION DATA: 

(A) APPLICAnON NUMBER: Not Assigned 

(B) FILING DATE: 16-JAN-1998 

(C) CLASSIFICATION: 

(viii) ATTORNEY /AGENT INFORMATION: 

(A) NAME: MCCORMACK, MYRA M. 

(B) REGISTRATION NUMBER: 36,602 

(C) REFERENCE/DOCKET NUMBER: 110.00400201 
(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 612-305-1225 

(B) TELEFAX. 612-305-1228 
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(2) INFOHMSTioM Fast SEQ ID HOil: 

(i) SEQOEHCE CHMMCTElilSTICS • 

(A) LEHGTH; 18S8 bate pair. 
IB) TYPE: nucleic acid 
(C) SmMDEOKESS: single 
(Bl ■KJPOLOSY, linear 

(ill MOLECTLE TOE: DNA (aenomlc) 

(Xll SEQUENCE DESCRIPTION, SEQ ID K0:1: 
CTC„ TTCTTGACCC COCCCAC«C A^TC ATOAAOCCCA .CA^.TO^c 
CTT.ACOCCO „CTTXT«T TCTCTT^ OAAC^CACO CCAAACCTT CCA.CTCCOT 
CATOTCCOCO TCOTCG^ 

TTTCGAT^Ce ATAATATC^ CXTC^COACC TGTAACACAC TATT.OACAC ATATCATCCA 
AACOCTCACC ATCCACCACC STACCCTCCT CACGATGGAT CA.TACCCA <«CTCCTT.C 
OOATA.C«G G^COTCC oATC=TCOCO CTO«AOT=C ACaCCCAGTC 

CGT.CCXCC. CCA=C«=A^ ^^^^^^ ^^^^^ 

CATCAA«=CC CACACCCATO «AACCAOAT CCTCCTCCOC C«A«=.CCCX C«CAC=^C0 
TCAA^CTA. OAC„ .cAACOTTOT OTATCO^. CAAAACCCOA T3A,»CC0.A 
G»CGTA=CO C«=OC==TOA C3TT0TATTC T^CCGAAOCT CT^CGCAOCO ««„ACCAC 
CATCAACCAA AAC.CCGATT CCCCCATCTA CCCACGCAAC A^ACGCCO COA^COOT 
CTA^TGAG GTCOTOTCA OCOTCCTCTA CGCCCCA^ TTCTTTOATC GGATGOACGG 
GCGCAT^ GGGTATGTGG ACGCCTTO^ GGCTCGCTCT CCCCAAGTCG AACTGTCCC 
OATCATGGAG GAAACCGCTG TGGCCAAAGA ^GGATCACA OCCC^TCAG ATCAGTA.CA 
^GCACGGCA GGAGGTCGTA TATCAGTTTG GCCCGCTCCT OCCACTACCA CGGCGGTGAC 
AGTTGAAGGA ATGCGATGGG CACAAGCCTT CGCCCGTGAT COGGCCGTAA TGTGGACGCT 
TCACATGGCG GAGAGCGATC ATGATGAGCG GATTCATGGG ATGAGTCCCG CCOACTACAT 
GGAGTGTTAC GGACTCTTGG ATOAGCGTCT GCAGGTCGCG CATTGCG«=T ACTTTGACCG 
GAAGGATGTT CGGCTGCTGC ACCGCCACAA ,GTG«.GGTC GCGTCGCACG TTG^CAA 
TGCCTACCTC GGCTCAGGGG TGGCCCCCGT GCCAOAGAW GTGGAGCGCG GCA«=GCCGT 
0«K:AnGGA ACAGATAACG GGAATAGTAA TGACTCCGCA AACAIGATCG GAGACATGAa' 
GTTTATGGCC CA^TTCACC GCGCGG^CA TCCGGA^CG GACGTGCTOA CCCCAGACAA^^ 
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GATTCTTGAA ATGGCGACGA TCGATGGGGC GCGTTCGTTG GGAATGGACC ACGAGATTGG . 1380 

TTCCATCGAA ACCGGCAAGC GCGCGGACCT TATCCTGCTT GACCTGCGTC ACCTCAGACG 1440 

ACTCTCACAT CATTTGGCGG CCACGATCGT GTTTCAGGCT TACGGCAATG AGGTGGACAC 1500 

TGTCCTGATT GACGGAAACG TTGTGATGGA GAACCGCCGC TT6AGCTTTC TTCCCCCTGA 1560 

ACGTGAGTTG GCGTTCCTTG AGGAAGCGCA GAGCCGCGCC ACAGCTATTT TGCAGCGGGC 1620 

GAACATGGTG GCTAACCCAG CTTGGCGCAG CCTCTAGGAA ATGACGCCGT TGCTGCATCC 1680 

GCCGCCCCTT GAGGAAATCG CTGCCATCTT GGCGCGGCTC -^GATTGGGGG GCGGACATGA 1740 

CCTTGATGGA TACAGAATTG CCATGAATGC GGCACTTCCG TCCTTCGCTC GTGTGGAATC 1800 

GTTGGTAGGT GAGGGTCGAC TGCGGGCGCC AGCTTCCCGA AGAGGTGAAA GGCCCGAG 1858 
(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 473 amino acids 

(B) TYPE: amino acid 

(C) STRANDBDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:2: 

Met Gin Thr Leu Ser He Gin His Gly Thr Leu Val Thr Met Asp Gin 
1 5 10 15 

Tyr Arg Arg Val Leu Gly Asp Ser Trp Val His Val Gin Asp Gly Arg 
20 25 30 

He Val Ala Leu Gly Val His Ala Glu Ser Val Pro Pro Pro Ala Asp 
35 40 45 

Arg Val He Asp Ala Arg Gly Lys Val Val Leu Pro Gly Phe He Asn 
SO 55 60 

Ala His Thr His Val Asn Gin He Leu Leu Arg Gly Gly Pro Ser His 
65 70 75 80 

Gly Arg Gin Phe Tyr Asp Trp Leu Phe Asn Val Val Tyr Pro Gly Gin 
85 90 , 95 

Lys Ala Met Arg Pro Glu Asp Val Ala Val Ala Val: Arg Leu Tyr Cys 

'i.-- •.•^^•.100 ■ .-..-4 ■' :105 • .;. .5v..^-- -.- •/■HO • • • 

Ala Glu Ala Val Arg Ser Gly lie - Thr: . T^^^ 

ll'^ ■''•120 ■ ■ -y '- ; 125' ' T 
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Ser .la XU ryr P.o Oly lie Olu Ma Ala Met Ma Val ryr O.y 

»s ™ TsJ «n -P -9 Hee 

oiv xu o.„ c,, 

^"^^ 175 

Gin Val Glu Leu Cvs Sa*- ^ 

180 Ala val Ala Lys Asp 

190 

ne XJ, ^ 

205 



lie Ser Val Trp Pro Ala ?v-i mi_ 

rp Ma Pro Ma Thr Thr Thr Ma Val Thr Val Glu 

220 

«y «.t ^ Trp „, ol„ «. p,, ^ 
Thr Uu His ^ 21 



250 



ser Pro Ala Glu Tyr Met Glu Cys Tyr Gly 



255 



265 



Leu Leu Asp Glu Arg Leu 



270 



Gin Val Ala His Cys Val Tvr- d>,« n« * 

275 ^ "^"^ ""^^ Lys Asp Val Arg Leu Leu 

285 

Hi. His val .y. v,I «. s„ cl„ val v.l Ser ^„ „a ry. 

300 

^ju 01, ..r 01, val Ma .ro Val Pro Cl„ Met Val olu *r, 01, «et 

315 

"a val 01, lie 01, ^p ^„ ^„ ^„ 2 
"et ne 01, ^ „a, 

AT. ASP ^ val ™, p 31U .,..„e Olu „ar «a Thr 



360 



365 



He Asp Gly Ala Arg Ser Leu Gly Met 



375 



Asp His Glu lie Gly Ser lie 



380 



Clu Thr Ol, ^ , ^ „^ ^ ^ 

400 

^ AT, :.u ser „1. „1. ^„ ,, . 

410 



Gly Asn Glu val: Asp Thr Val teu hV Ak^ gI^ Asrf 



425 



Val Vai' Met Glu 
. 430 
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Asn Arg Arg Leu Ser Phe Leu Pro Pro Glu Arg Glu Leu Ala Phe Leu 
435 440 445 

Glu Glu Ala Gin Ser Arg Ala Thr Ala lie Leu Gin Arg Ala Asn Met 
450 455 460 

Val Ala Asn Pro Ala Trp Arg Ser Leu 
465 470 

(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1808 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

GCGAGCATGG TGACCTTGAC GCCGCTCTTT TCGTTCTCTT TGTTGAACTG CACGCGAAAG 60 

GCTTCCAGGT CGGTGATGTC CGCGTCGTCG TGGTTGGTGA CGTGCGGGAT GACCACCCAG 120 

TTGCGGTGCA GGTTTTTCGA TGGCATAATA TCTGCGTTGC GACGTGTAAC ACACTATTGG 180 

AGACATATCA TGCAAACGCT CAGCATCCAG CACGGTACCC TCGTCACGAT GGATCAGTAC 240 

CGCAGAGTCC TTGGGGATAG CTGGGTTCAC GTGCAGGATG GACGGATCGT CGCGCTCGGA 300 

GTGCACGCCG AGTCGGTGCC TCCGCCAGCG GATCGGGTGA TCGATGCACG CGGCAAGGTC 360 

GTGTTACCCG GTTTCATCAA TGCCCACACC CATGTGAACC AGATCCTCCT GCGCGGAGGG 420 

CCCTCGCACG GGCGTCAATT CTATGACTGG CTGTTCAACG TTGTGTATCC GGGACAAAAG 480 

GCGATGAGAC CGGAGGACGT AGCGGTGGCG GTGAGGTTGT ATTGTGCGGA AGCTGTGCGC 54 0 

AGCGGGATTA CGACGATCAA CGAAAAGGCC GATTCGGCCA TCTACCCAGG CAACATCGAG 600 

GCCGCGATGG CGGTCTATGG TGAGGTGGGT GTGAGGGTCG TCTACGCCCG CATGTTCTTT 660 

GATCGGATGG ACGGGCGCAT TCAAGGGTAT GTGGACGCCT TGAAGGCTCG CTCTCCCCAA 720 

GTCGAACTGT GCTCGATCAT GGAGGGAACG GCTGTGGCCA AAGATCGGAT CACAGCCCTG 780 

TCAGATCAGT ATCATGGCAC GGCAGGAGGT CGTATATCAG TTTGGCCCGp TCCTGCCACT- --840 



ACCACGGCGG TGACAGTTGA AGGAATGCGA TGGGCACa^G OCTITCGCCCq^,'^ 
GTAATGTGGA CGCTTCACAT GGCGGAGAGC (3ATCATGATG 'AGOGGATO 
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=~ ^^^^^^^ 

o-c^ - 

:::r ~ ~ ~ — ~ 

CGCGGCATGG CCGTGGGCAT Trraaono.^ 

TGGMCAGAT AACGGGAATA GTMTGACTC CGTAAACATG 



1020 
1080 
1140 
1200 



' GGCCCATATT 


CACCGCGCGG 


TGCATCGGGA 


TGCGGACGTG 


1260 


TGAAATGGCG 


ACGATCGATG 


GGGCGCGTTC 


GTTGGGAATG 


1320 


CGAAACCGGC 


AAGCGCGCGG 


ACCTTATCCT 


GCTTGACCTG 


1380 


TCACCATCAT 


TTGGCGGCCA 


CGATCGTGTT 


TCAGGCTTAC 


1440 


CCTGATTGAC 


GGAAACGTTG 


TGATGGAGAA 


CCGCCGCTTG 


1500 



1560 
1620 
1680 
1740 



~ ~ ~ ™ 

i~ — ~ ~ ™c 
~o ~„ ^^^^ ^ 

r^r. ^^^^ 

AGTGAAAG TTCCCGAAQA isoo 

(2) INFORMATION FOR SEQ ID K0:4: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGIH: 1846 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



60 
120 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO-4. 
OAGCGCCGCC ACAGCAGCCT TGATCATGAA GGCGAGCATG GTGACC.GA CGCCGCTCTT 
-OTTCTCT TTGTTGAACT GCACGCGAAA GGC.CCAGG TCGGTGATGT CCGCGTCGTC 
~ ACGTGCGGGA „CCA GTTGCGGTGC AGGT^CG ATGGCGTAAT^ .0 
ATCTGCGTTG CGACGTGTAA CACACTA7<rr r.^. ' ^ " ' " -^K^: .. ' ' 

AA CACACTATTG GAGACATATC ATGCAAACGC TCAGCATcdi'^' ' , 
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GGATCGGGTG ATCGATGCAC GCGGCAAGGT CGTGTTACCC GGTTTCATCA ATGCCCACAC 420 

CCATGTGAAC CAGATCCTCC TGCGCGGAGG GCCCTCGCAC GGGCGTCAAT TCTATGACTG 480 

GCTGTTCAAC GTTGTGTATC CGGGACAAAA GGCGATGAGA CCGGAGGACG TAGCGGTGGC 540 

GGTGAGGTTG TATTGTGCGG AAGCTGTGCG CAGCGGGATT ACGACGATCA ACGAAAACGC 600 

CGATTCGGCC ATCTACCCAG GCAACATCGA GGCCGCGATG GCGGTCTATG GTGAGGTGGG 660 

TGTGAGGGTC GTCTACGCCC GCATGTTCTT TGATCGGATG GACGGGCGCA TTCAAGGGTA 720 

TGTGGACGCC TTGAAGGCTC GCTCTCCCCA AGTCGAACTG TGCTCGATCA TGGAGGAAAC 780 

GGCTGTGGCC .AAAGATCGGA TCACAGCCCT GTCAGATCAG TATCATGGCA CGGCAGGAGG 840 

TCGTATATCA GTTTGGCCCG CTCCTGCCAC TACCACGGCG GTGACAGTTG AAGGAATGCG 900 

ATGGGCACAA GCCTTCGCCC GTGATCGGGC GGTAATGTGG ACGCTTCACA TGGCGGAGAG 960 

CGATCATGAT'GAGCGGATTC ATGGGATGAG TCCCGCCGAT TACATGGAGT GTTACGGACT 1020 

CTTGGATGAG CGTCTGCAGG TCGCGCATTG CGTGTACTTT GACCGGAAGG ATGTTCGGCT 1080 

GCTGCACCGC CACAATGTGA AGGTCGCGTC GCAGGTTGTG AGCAATGCCT ACCTCGGCTC 1140 

AGGGGTGGCC CCCGTGCCAG AGATGGTGGA GCGCGGCATG GCCGTGGGCA TTGGAACAGA 1200 

TAACGGGAAT AGTAATGACT CCGTAAACAT GATCGGAGAC ATGAAGTTTA TGGCCCATAT 1260 

TCACCGCGCG GTGCATCGGG ATGCGGACGT GCTGACCCCA GAGAAGATTC TTGAAATGGC 1320 

GACGATCGAT GGGGCGCGTT CGTTGGGGAT GGACCACGAG ATTGGTTCCA TCGAAACCGG 1380 

CAAGCGCGCG GACCTTATCC TGCTTGACCT GCGTCACCCT CAGACGACTC CTCACCATCA 1440 

TTTGGCGGCC ACGATCGTGT TTCAGGCTTA CGGCAATGAG GTGGACACTG TCCTGATTGA 1500 

CGGAAACGTT GTGATGGAGA ACCGCCGCTT GAGCTTTCTT CCCCCTGAAC GTGAGTTGGC 1560 

GTTCCTTGAG GAAGCGCAGA GCCGCGCCAC AGCTATTTTG CAGCGGGCGA ACATGGTGGC 1620 

TAACCCAGCT TGGCGCAGCC TCTAGGAAAT GACGCCGTTG CTGCATCCGC CGCCCCTTGA 1680 

GGAAATCGCT GCCATCTTGG CGCGGCTCGG ATTGGGGGGC GGACATGACC TTGATGGATA 1740 

CAGAATTGCC ATGAATGCGG CACTTCCGTC CTTCGCTCGT GTGGAATCGT TGGTAGGTGA 1800 

GGGTCGACTG CGGGCGCCAG CTTCCCGAAG AAGTGAAAGG CCCGAG 1846 

(2) INFORMATION FOR SEQ ID N0:5: 

. . _ ^--.^r ■ ■ . ^ ' ■ ■■ - 
(i); SEQUENCE 'characteristics : - ^ " 
(A) liENGTH: 601 amino acids 

. . ; JB) ,^T»E:';^aidLno':acid ; V ^-"b'fj^ -} -.ry -i^:^ :v~^■ : >s^St^^tV '-U>^ .-^^^^ 

' (C) ' STRANDEDNESS : single " 
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(D) TOPOLOGY, linear 
(ii) MOLECULE TYPE: protein 



(Xi) SEQUENCE DESCRIPTION: SEQ ID N0:5 
Ala Ser Met Val Thr Leu Thr 



Pro Leu Phe Ser Phe Ser Leu Leu Asn 

15 



C.S TK. ^ 



30 



Va. ... ^^^^ 
«e se. ^ ^ ^ 



60 



100 - Pro Ala Asp Arg 

110 



Gin Thr Leu Ser rir. t»i 

He Gin H.S Gly Thr Leu Val Thr Met Asp Gl„ Tyr 

75 

vax ^ „^ ^ 

Val Ala Leu Gly Val His Ala Glu 
100 

Val He Asp Ala Arg Gly Lvs Val v=i t 

g ly Lys Val Val Leu Pro Gly Phe He Asn Ala 

^■'^ 125 
- His V,l ^„ ox„ n. Uu .eu CX, CX. P,o ... His OX. 

140 

0X„ PH. ^„ ^„ 

155 .i^ 

160 

Ala Met Arg Pro Glu Asp Val Ala Val at. v i . 

P Ala Val Ala Val Arg Leu Tyr Cys Ala 

^ 175 



Clu «. vax .e. «x, xXe T.. .H. xXe ^ .x„ ^„ 

190 

^ "a M.. v,X ^ ox. 0X„ 

''^^ 205 

-1 OX. vax val Val 

220 

Gly Arg He Gin Gly Tvr Val «i , 

225 ^ ""^^ '^^P Ala Leu Lys Ala Arg Ser Pro Qln 

235 ■■■ ^ • ••■ • . ■• 

.Val Glu Leu Cys Ser He Met Glu Gly Thr ^Ala Val^^ -^^^^ ' ^ 

245 ^ ocn I-ys Asp Arg 
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He Thr Ala Leu Ser Asp Gin Tyr His Gly Thr Ala Gly Gly Arg He 
260 265 270 

Ser Val Trp Pro Ala Pro Ala Thr Thr Thr Ala Val Thr Val Glu Gly 
275 280 285 

Met Arg Trp Ala Gin Ala Phe Ala Arg Asp Arg Ala Val Met Trp Thr 
290 295 300 

Leu His Met Ala Glu Ser Asp His Asp Glu Arg He His Gly Met Ser 
305 310 315 320 

Pro Ala wiu Tyr Met Glu Cys Tyr Gly Leu Leu Asp Glu Arg Leu Gin 
325 330 335 

Val Ala His Cys Val Tyr Phe Asp Arg Lys Asp Val Arg Leu Leu His 
340 345 350 

Arg His Asn Val Lys Val Ala Ser Gin Val Val Ser Asn Ala Tyr Leu 
355 360 365 

Gly Ser Gly Val Ala Pro Val Pro Glu Met Val Glu Arg Gly Met Ala 
370 375 380 

Val Gly He Gly Thr Asp Asn Gly Asn Ser Asn Asp Ser Val Asn Met 
385 390 395 400 

He Gly Asp Met Lys Phe Met Ala His He His Arg Ala Val His Arg 
405 410 415 

Asp Ala Asp Val Leu Thr Pro Glu Lys He Leu Glu Met Ala Thr He 
420 425 430 

Asp Gly Ala Arg Ser Leu Gly Met Asp His Glu He Gly Ser He Glu 
435 440 445 

Thr Gly Lys Arg Ala Asp Leu He Leu Leu Asp Leu Arg His Pro Gin 
450 455 460 

Thr Thr Pro His His His Leu Ala Ala Thr He Val Phe Gin Ala Tyr 
465 470 475 480 

Gly Asn Glu Val Asp Thr Val Leu He Asp Gly Asn Val Val Met Glu 
485 490 495 

Asn Arg Arg Leu Ser Phe Leu Pro Pro Glu Arg Glu Leu Ala Phe Leu 
500 505 510 

Glu Glu Ala Gin Ser Arg Ala Thr Ala He Leu Gin Arg Ala Asn Met 
515 520 525 

yal Ala Asn Pro Ala Trp Arg Ser Leu Glu Met Thr Pro Leu Leu -ms / 

'530' ' * ^ 535 540 . 



Pro ' Pro Pro -Leu -Glu ' Glu lie -Ala Ala He ^-Leu Ala 'Arg Leu * Gly ' Leu 
545^^2 ' ' • ■ : - SSS - ^ ' .-"9;: 
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Oly Oly His ASP Leu Asp Gly Arg He Al 

570 ^^^'^ ^" Ala Ala 

Leu Pro ser Phe Ala Arg Val gi„ c 

sis "-^^ Arg Leu 



ArgAlaProAlaSer.ArgArgSerolu 

600 



585 

590 



^2) INFORMATION FOR SEQ 



ID NO: 6; 



(i) SEQUENCE CHARACTERISTICS. 

(A) LENGTH • t:T a . 

(B) Tvpp ^""^"^ acids 

IB TYPE: amino acid 
C STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(Xi) SEQUENCE DESCRIPTION: SEQ ID N0:6: 

Ser Ala Ala Thr Ala r 

" «a „^ 

Thr Pro Leu phe Ser Phe fi.^ i , 

ser .eu Leu M„ Cy. Thr Lys «e ser 

- - ,he Phe 0.. V. „e l 

60 ^ 

Arg Val Thr His Tvr Tr^ « 

65 Arg His He Met Gin Thr t o 

70 Leu Ser lie Gin 

- V. 2^ 

110 

Ala Glu Ser Val d*-^ r» 

er val Pro Pro Pro Ala Asp Arg Val » 

120 Ala Arg ciy 

125 

Lys Val Val Leu Pro Glv Pho ti 

140 

v^/^c^-eu^Leu Arg,Gly Glv >. 

145 • . Ser .^s Gly Aro Cln t»v.' w v 

.■ . ■ XJ-.v -150 . . : . ^1 ^-J^^^ 

•.^?.^Phe^:Asn,yal,yai.;Tyr,^^^^^ . . ■ 
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Val Ala Val Ala Val Arg Leu Tyr Cys Ala Glu Ala Val Arg Ser Gly 
180 185 190 

He Thr Thr He Asn Glu Asn Ala Asp Ser Ala He Tyr Pro Gly Asn 
195 200 205 

He Glu Ala Ala Met Ala Val Tyr Gly Glu Val Gly Val Arg Val Val 
210 215 220 

Tyr Ala Arg Met Phe Phe Asp Arg Met Asp Gly Arg He Gin Gly Tyr 
225 230 235 240 

Val Asp Ala Leu Lys Ala Arg Ser Pro Gin Val Glu Leu Cys Ser He 
245 250 255 

Met Glu Glu Thr Ala Val Ala Lys Asp Arg He Thr Ala Leu Ser Asp 
260 265 270 

Gin Tyr His Gly Thr Ala Gly Gly Arg He Ser Val Trp Pro Ala Pro 
275 280 285 

Ala Thr Thr Thr Ala Val Thr Val Glu Gly Met Arg Trp Ala Gin Ala 
290 295 300 

Phe Ala Arg Asp Arg Ala Val Met Trp Thr Leu His Met Ala Glu Ser 
305 310 315 320 

Asp His Asp Glu Arg He His Gly Met Ser Pro Ala Asp Tyr Met Glu 
325 330 335 

Cys Tyr Gly Leu Leu Asp Glu Arg Leu Gin Val Ala His Cys Val Tyr 
340 345 350 

Phe Asp Arg Lys Asp Val Arg Leu Leu His Arg His Asn Val Lys Val 
355 360 365 

Ala Ser Gin Val Val Ser Asn Ala Tyr Leu Gly Ser Gly Val Ala P: o 
370 375 380 

Val Pro Glu Met Vai Glu Arg Gly Met Ala Val Gly He Gly Thr Asn 
385 390 395 400 



Asn Gly Asn Ser Asn Asp Ser Val Asn Met He Gly Asp Met Lys Phe 
405 410 415 



Met Ala His He His Arg Ala Val 
420 

Pro Glu Lys He Leu Glu Met Ala 
435 440 

Gly 'Met Asp^His Glu He Gly Ser 



His Arg Asp Ala Asp Val Leu Thr 
425 430 

Thr He Asp Gly Ala Arg Ser Leu • 
445 

He Glu Thr Gly Lys Arg Ala Asp > 
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Thr Thr Pro His His His 

480 



Leu lie Leu Leu Asp Leu Arg His Pro Gin 

470 

«. «a Xle V.1 ..e cXn Ma cx, v.l 

490 

val ^„ ^^^^^ ^^^^ ^^^^ 

P„ P„ ^„ 

520 

Ala Ma ne L.u 31„ ^ Ma «„ „« VaX M. p„ 



535 



540 



^9 s,r L,u Glu „« p„ ^„ 



550 



555 



Glu Glu 
560 



XI. AXa Ma xle ^„ m. ^ ^„ 

A.P «1. XX. Ma Met *s„ Ma Ma p„ Z 

590 

V.1 01„ se. ..u val Cl, oij, ^ ^„ M. P„ Ma Ser 

600 - ^ 



605 



Arg ser Glu Arg Pro Glu 
610 



(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 545 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(Xi) SEQUENCE DESCRIPTION: SEQ ID N0:7: 
«GTATC=« G«TTc™^ 3C=C^C«C «<:.0CCNT3 ^TCAT^GG C»C«T«.T 
«CCTB0AC8 CCGTNTTTW OTTNTTrTTT CTTO^^ ^CC^^ TTCCAOOTC 
OTGATOTCCG CCTCOTCGIC OTTGGTOACG TaCX^OGATOA CCACCCA^T 6^0^ 

OCATAA^T. «C=™=<» OG^^A^C ACTA^T^ , , 

CAAACSCTCA =«-=C«^ CG^TACCCTC OTCACCAT^ -ATCACTACC^^.C^^^ , , 
«aGATA=CT GGGTTCAOOT 8CA««T8GA CGOATCGTCO CGCTC«iA« ^GCACaOCGAc' " ' 



60 
120 
180 



360 
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TCGGTGCCTC CGCCAGCGGA TCGGGTGATC GATGCACGCG GCAAGGTCGT GTTACCCGGT 



420 



TTCATCAATG CCCACACCCA TGTGAACCAG ATCCTCCTGC GCGGAGGGCC CTCGCACGGG 



480 



CGTCAATTCT ATGACTGGCT GTTCAACGTT GTGTATCCGG GACAAAAGGC GATGAGACCG 



540 



GAGGA 



545 



(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 499 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:8: 

CCTGCGCGGA GGGCCTCCGC ACGGGCGTCA ATTCTATGAC TGGCTGTTCA ACGTTGTGTA 60 

TCCGGGACAA AAGGCGATGA GACCGGAGGA CGTAGCGGTG GCGGTGAGGT TGTATTGTGC 120 

GGAAGCTGTG CGCAGCGGGA TTACGACGAT CAACGAAAAC GCCGATTCGG CCATCTACCC 180 

AGGCAACATC GAGGCCGCGA TGGCGGTCTA TGGTGAGGTG GGTGTGAGGG TCGTCTACGC 240 

CCGCATGTTC TTTGATCGGA TGGACGGGCG CATTCAAGGG TATGTGGACG CCTTGAAGGC 300 

TCGCTCTCCC CAAGTCGAAC TGTGCTCGAT CATGGAGGAA ACGGCTGTGG CCAAAGATCG 360 

GATCACAGCC CTGTCAGATC AGTATCATGG CACGGCAGGA GGTCCTATAT CAGTTTGGCC 420 

CGCTCCTGCC ACTACCACGG CGGTGACATT TAAANGAATC CATGGGCCAA CCTCCCCCGT 480 

GATCCGGCGG TAATGTGAC 4 99 
(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 360 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 




(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
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™™ ^^^^ ^^^^^^^ 

cccc«.,e. ^^^^^ ^^^^^^ 

.»c..3.c ^^^^^^^^ 

GCTGACCCCA OAOAAGATTO TTGAAATOGC 

TGAAATGGC GACGATCGAT GSGSCGCCTT TCGITCOSGA 
TGGACCACGA GAX.GG^C A.CAAACCG CCAAGC^. 

-CGXCACCO XCAGACGAC CC.CACCA. a™3CGGC CACGA.CG. „.cagGC^ 
(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 443 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



60 
120 
180 
240 

300 
360 



SEOOEKCE DESCRIPTIOM, SEO ID Ko.iO: 
-CCACGAX cg™ .,„,^„ ^^^^ ^^^^ 

.cg™ OGAGAACCGC CGC^AGC 

"-GGAAGC GCAGAGCCGC GCCACAGCTA TTTTGCATCG GCCGAAACAT GG^aaC 
CCAGCX^C CCAGCCXCA GGAAA^cg CGG™.«e A^CCGCC ccx^aggaA 
™CA .C^GCG. GCXCG^™ «^eGGAC AX^CC^A ™« 

attgccatga A^CGGCACT xccgtccttg gctcgtgtgg aatcg^,, aggtgaggg. 

CGAC^CGGG CGCCAGCTX. CCGAAGAGG. CAAAGCCCGA GGAXCCX^a GAGXCCGA„ 

TTTCCGATGT CATCACCGGC GCG 

(2) INFORMATION FOR SEQ ID NO: 11: 

<i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 505 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



60 
120 
180 
240 
300 
360 
420 
443 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
CCTGCGCGGA GGCCTCCGCA CGGGCGTCAA TTCTATGACT GGCTGTTCAA CGTTGTGTAT 60 
CCGGGACAAA AGGCGATGAG ACCGGAGGAC GTANCGGTGG CGGTGAGGTT GTATTGTGCG 120 
GAAGCTGTGC GCAGCGGGAT TACGACGATC AACGAAAACG CCGATTCGGC CATCTACCCA 180 
GGCAACATCG AGGCCGCGAT GGCGGTCTAT GGTGAGGTGG GTGTGAGGGT CGTCTACGCC 240 
CGCATGTTCT TTGATCGGAT GGACGGGCGC ATTCAAGGGT ATGTGGACGC CTTGAAGGCT 300 
CGCTCTCCCC AAGTCGAACT GTGCTCGATC ATGGAGGAAA CGGCTGTGGC CAAAGATCGG 360 
ATCACANCCC- TGTCAGATCA NTATCATGGC ACGGCANGAG GTCCTATATC ANTTTGGCCC 420 
GCTCCTGCCA CTACCACNGC GGTGACATTT NAANGAATTC CATNGGCACA ACCTTCCCCC 
GTGATCNGGC GGTAATGTNG ACCCA 
(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 144 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



480 

505 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

Pro His Gly Arg Gin Phe Tyr Asp Trp Leu Phe Asn Val Leu Tyr Pro 
1 5 10 15 

Gly Gin Lys Ala Met Arg Pro Glu Asp Val Ala Val Ala Val Arg Leu 
20 25 30 

Tyr Cys Ala Glu Ala Val Arg Ser Gly He Thr Thr He Asn Glu Asn 
35 40 45 

Ala Asp ser Ala He Tyr Pro Gly Asn He Glu Ala Ala Met Ala Val 
50 55 60 

Tvr Gly Glu Val Gly Val Arg Val Val Tyr Ala Arg Met Phe Phe Asp 
el 70 75 80 

Arg Met Asp Gly Arg He Gin Gly Tyr Val Asp Ala Leu -Lys ' Ala Arg ■ 
ser Pro Gin Val Glu Leu Cys Ser He Met Glu Glu Thr iMkJVal Ala^^^^^^^ 
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- - - - - - Oln ^ 

125 

Gly Arg He ser Val Tm Prr, 1,1 „ 

Txp Pro Ala Pro Ala Thr Thr Thr Ala Val Thr 

140 

(2) INFORMATION FOR SEQ ID NO:13: 

(i) SEQUENCE CHARACTERISTICS. 

A LENGTH: 144 amino acids 
<B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: 



protein 



ex. ^ ^ ^ ^ 

- - vl V. V. ^„ 

:r - - - J- - 

^ ir "° "a «e. vax 

60 



Tyr Gly Glu Val Qly Val Arg Val Val Tvr . 
^5 70 ^3 Met Phe Phe 

75 

Arg Met Asp Gly Arg xie Gin Gly Tyr Val A.n n^v, 

85 ^ ^ Thr Leu Lys Ala 

-r Pro Gin Val Glu Leu Cys Ser He Met Glu Glu T.r Ala Val 



105 --- — Ala 

- -a Leu se. Asp Gin Tyr His Gly T.r Ala Gly 

lie ser Val Trp Pro Ala Pro Ala T.r T.r T^^ Ala Val T.r 

.140 

INFORMATION POR SEQ ID N0:14: ...^ , , . ^ 

(i) SEQUENCE CHARACTERISTICS- " ■ 

■ • ^^^v »^l^^I?(PE: amin6:acid "---/ko^- ^ . ;--V\ ."^ 

(C) STOANDEDNESS: single ' • : 
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(D) TOPOIiOGY: linear 
(ii) MOLECULE TYPE: protein 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

Pro His Gly Arg Gin Phe Tyr Asp Trp Leu Phe Asn Val Val Tyr Pre 
15 10 15 

Gly Gin Lys Ala Met Arg Pro Glu Asp Val Ala Val Ala Val Arg Leu 
20 25 30 

Tyr Cys Ala Glu Ala Val Arg Ser Gly He Thr Thr He Asn Glu Asn 
35 40 45 

Ala Asp Ser Ala He Tyr Pro Gly Asn He Glu Ala Ala Met Ala Val 
50 55 60 

Tyr Gly Glu Val Gly Val Arg Val Val Tyr Ala Arg Met Phe Phe Asp 
65 70 75 80 

Arg Met Asp Gly Arg He Gin Gly Tyr Val Asp Ala Leu Lys Ala Arg 
85 90 95 

Ser Pro Gin Val Glu Leu Cys Ser He Met Glu Glu Thr Ala Val Ala 
100 105 110 

Lys Asp Arg He Thr Ala Leu Ser Asp Gin Tyr His Gly Thr Ala Gly 
115 120 125 

Gly Arg He Ser Val Trp Pro Ala Pro Ala Thr Thr Thr Ala Val Thr 
130 135 140 



(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CIIARACTERISTICS : 

(A) LENGTH: 145 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 



Ser Ws" Gl9 Ai^" bin Phe Asp Trp Leu" Phe Asn Val Leu Tyr'-Pro ■ .l ' 
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«y CXn ^ ^^^^ 

^ c.. ^^^^ ^^^^^ 2 

-n 

60 

val T., o., ozu v.. ^ 

80 

A.. „e. ..p 0., 

se. V.1 2 val 

110 

Ala .3p ^ ^^^^^ 

Ciy ne s„ v,l ^ ^„ 2 Th. Ma Val 



140 

Thr 
145 



(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 144 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

Ser His Gly Arg Gin Phe .^r Asp Trp Leu Phe Asn Val Val ryr Pro 
Oly Oln Lys Ala Met Arg Pro Glu Asp Val Ala Val Ala Val Leu 
ryr CVS Ala Glu Ala Val Arg Ser Oly He T.r Thr He l Glu Asn 



45 



""^ To' ^. ^'y^ ne Glu AirAl^Met Ala V.l' 



Tyr Gly Glu Val Gly Val Arb Val i^i ,v ^ i 

65 ^ ^^^^^^^.^^ Ala Arg Met Phe Phe Asp' 

■ ■ ''^ 80 ■ 
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Arg Met Asp Gly Arg He Gin Gly Tyr Val Asp Ala Leu Lys Ala Arg 
85 90 95 

Ser Pro Gin Val Glu Leu Cys Ser He Met Glu Glu Thr Ala Val Ala 
100 105 110 



Lys Asp Arg He Thr Ala Leu Ser Asp Gin Tyr His Gly Thr Ala Gly 
115 120 125 

Gly Arg He Ser Val Trp Pro Ala Pro Ala Thr Thr Thr Ala Val Thr 
130 135 140 



(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1633 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

CGCGAAAGGC TTCCAGGTCG GTGATGTCCG CGTCGTCGTG GTTGGTGACG TGCGGGATGA 60 

CCACCCAGTC GCGGTGCAGG TTTTTCGATG GCATAATATC TGCGTTGCGA CGTGTAACAC 120 

ACTATTGGAG ACATATCATG^ CAAACGCTCA GCATCCAGCA CGGTACCCTC GTCACGATGG 180 

ATCAATACCG CAGAGTCCTT GGGGATAGCT GGGTTCACGT GCAGGATGGA CGGATCGTCG 240 

CGCTCGGAGT GCACGCCAAG TCGGTGCCTC CGCCAGCGGA TCGGGTGATC GATGCACGCG 300 

GCAAGGTCGT GTTACCCGGT TTCATCAATG CCCACACCCA TGTGAACCAG ATCCTCCTGC 360 

GCGGAGGGCC CTCGCACGGG CGTCAATTCT ATGACTGGCT GTTCAACGTT GTGTATCCGG 420 

GACAAAAGGC GATGAGACCG GAGGACGTAG CGGTGGCGGT GAGGTTGTAT TGTGCGGAAG 480 

CTGTGCGCAG CGGGATTACG ACGATCAACG AAAACGCCGA TTCGGCCATC TACCCAGGCA 540 

ACATCGAGGC CGCGATGGCG GTCTATGGTG AGGTGGGTGT GAGGGTCGTC TACGCCCGCA 600 
TGTTCTTTGA TCGGATGGAC GGGCGCATTC AAGGGTATGT GGACGCCTTG 'AAGGCTCGCT ' 660 
CTCCCCAAGT CGAACTGTGC TCGATCATGG AGGAAACGGC TGTGGCCAAJL'^GATCGS^fCA''''^^ 
CAGCCCTGTC. AGATCAGTAT^.CATC^ CAGGAGGTCG TATATO^GTr^?^ 
' CTGCCACTAC CACGGCGGTG ACAGTTGAAG .GAATGCGATG GGCACAAGCC "TTCGCCC^^^ 
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"CCOCO.. ^^^^^ 

~cc c=c„ 

-~ o.^. ^^^^^^ 

~c. c..^. 
^«*c„.. 

.«coc.« «™ 

~<=. cc.„..„ ^^^^^^ ^^^^ 

™.cc.cc. ^CCCC. «o„ 

-=™c.o c«^, ^^^^ ^ 

-™ cc™„„^ ^ 
oc=c«„cc ^^^^ ^^^^ 

GGCTCGGATT GGG ^ 
(2) INFORMATION FOR SEQ ID N0:18: ^^^"^ 

(i) SEQUENCE CHARACTERISTICS. 

(A) LENGTH: 1598 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO-IB- 

r^r^r^ xo.™ <.^,„„ ^^^^^^ 

^^^^ 

CCOTC^O. ^^^^^ 

c™== --=- ~»o.cc^,.S^^^,... ...... ^^^^^^^ 

«~ cc^c:.cc<. .™e,«<=«=c^c. .^^....3,..:::: 

— ~« xcc.c«c«,««.^,^ ■ :^ lit- 



60 
120 
180 
240 
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GCGGTGAGGT TGTATTGTGC GGAAGCTGTG CGCAGCGGGA TTACGACGAT CAACGAAAAC 480 

GCCGATTCGG CCATCTACCC AGGCAACATC GAGGCCGCGA TGGCGGTCTA TGGTGAGGTG 540 

GGTGTGAGGG TCGTCTACGC CCGCATGTTC TTTGATCGGA TGGACGGGCG CATTCAAGGG 600 

TATGTGGACG CCTTGAAGGC TCGCTCTCCC CAAGTCGAAC TGTGCTCGAT CATGGAGGAA 6 60 

ACGGCTGTGG CCAAAGATCG GATCACAGCC CTGTCAGATC AGTATCATGG CACGGCAGGA 720 

GGTCGTATAT CAGTTTGGCC CGCTCCTGCC ACTACCACGG CGGTGACAGT TGAAGGAATG 780 

CGATGGGCAC AAGCCTTCGC CCGTGATCGG GCX3GTAATGT GGACGCTTCA CATGGCGGAG 840 

AGCGATCATG ATGAGCGGAT TCATGGGATG AGTCCCGCCG AGTACATGGA GTGTCACGGA 900 

CTCTTGGATG AGCGTCTGCA GGTCGCGCAT TGCGTGTACT TTGACCGGAA GGATGTTCGG 960 

CTGCTGCACC GCCACAATGT GAAGGTCGCG TCGCAGGTTG TGAGCAATGC CTACCTCGGC 1020 

TCAGGGGTGG CCCCCGTGCC AGAGATGGTG GAGCGCGGCA TGGCCATGGG CATTGGAACA 1080 

GATAACGGGA ATAGTAATGA CTCCGTAAAC ATGATCGGAG ACATGAAGTT TATGGCCCAT 1140 

ATTCACCGCG CGGTGCATCG GGATGCGGAC GTGCTGACCC CAGAGAAGAT TCTTGAAATG 1200 

GCGACGATCG ATGGGGCGCG TTCGTTGGGA ATGGACCACG AGATTGGTTC CATCGAAACC 1260 

GGCAAGCGCG CGGACCTTAT CCTGCTTGAC CTGCGTCACC CTCAGACGAC TCCTCACCAT 1320 

CATTTGGCGG CCACGATCGT GTTTCAGGCT TACGGCAATG AGGTGGACAC TGTCCTGATT 1380 

GACGGAAACG TTGTGATGGA GAACCGCCGC TTGAGCTTTC TTCCCCCTGA ACGTGAGTTG 1440 

GCGTTCCTTG AGGAAGCGCA GAGCCGCGCC ACAGCTATTT TGCAGCGGGC GAACATGGTG 1500 

GCTAACCCAG CTTGGCGCAG CCTCTAGGAA ATGACGCCGT TGCTGCATCC GCCGCCCCTT 1560 

GAGGAAATCG CTGCCATCTT GGCGCGGCTC GGATTGGG 1598 
(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1586 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 
ACGTGCGGGA TGACCACCCA GTTGCGGTGC AGGTTTTTCG ATGGCGTAAT ATCTGCGTTG 



60 
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...... ..„e ™ecc cc.cco.cc 

„C. CCCC.C.C.C CCCCC.. CC.CCO.C. cccccc. 

O.CCC..CO „o ...cccc c„c cccocccc cc™ 

.XC„ CCCCC.CC. CC.CX.CCC CC..C.C. „C.C CC.C.C.C 
C.C..CCXCC .CCCCCC.CC CCCC.CCC.C CCCCCC. .C..C.CC CC„C 
~C CCCC.C... CCCC.C.C. CC.C.CCCC „C CC.C.CC.C 

™ccc „ccc c„ .„c. .cc...ecc cc^ccccc 
.™ cc„. 

O.C.CCCCC CC™ „c CCCCCCCC .C.CCC.. .ccccccc 

^c^cccc ccccccc. „c .CC.CC..C .CC.CC..C cccc^ec 

-O..CCC. .C.C.CCCC C.C.C..C.C ..C..CCC. cccc.cc.cc .CCX....e. 

CTTTCCCCCC CTCCTCCC.C T.CC.CCCCC GTC.CCTTC ^r.. 

^^iGACACTTG W^CCMTGCG ATGGGCACWl 
GCCTTCCCCC GTC.TCCGGC GGT.ATCTCG ACGCTTC.CA rrnr^ 

^ ACGCTTCACA TGGCGG.GAG CGATCTGAT 
G.GCGGATTC ATGGGATG.C TCCCGCCear n^n^,^ 

TCCCGCCCAG TACATGG.GT CTT.CCC.CT CTTGCATOAC 
CC.C.GCAGC .CGCCCA.G CG.C.AC.. CACCCCAACC A.G„ 
CAC.G.GA AGC.CGCGXC GCACG.c.G ACC^ ACGGG^CC 
CCCG.GCCC AGA.CC.CC. CC.CGGC..C CCCG.CCCC. .CC.AC.C. ...eccc.. 

-.A.GAC CCG.AA.CA. GA.CCGAC.C A.GAAC... .CACCCCCCC 

-OCA.CCGG A.GCCCACC. GCGACCCCA GAGAACA..C rr.^.^, 
-OCCCG. CG^CA.. GG.CCACCAC ..CC.CC .CG...CCCC CCCCCCCC 
OACC..A.CC .GC..GACC. GCC.CACCC. CACACCAC.C C.CACCA.C ...CCCCCCC 
-OA.CG.C. „CACCC.. CGGCA..C.C G.GC.C.C.C .CC.C..C. CGC„ 
O.C..GG.GA ACCGCCCC. G.CC..C.. CCCCC.G.AC G.AC.CCC G..CC.CC 
OAAGCGCAC. GCCGCCCC.C .GC......C e.CCGCGCCA .C.CG.CCC ...CCC.CC. 

-CGCGCC .C.AGCA.. C.CGCCG..C CCCCCCC .eCCC.C. GG„. .^SO 
GCC.C.TCG CGCGGCTCGG ...GCG 
(2) INP0RMA.1ON FOR SEQ ID NO:20: 

(i) . SEQUENCE CHAR.CTERIS.1CS- " V . ' . 

(A) LENGTO: 1597 base pairs V ; 



120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 ■ 
1500 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 

CGTGGTTGGT GACGTGGGGG ATGACCACCC AGTCGCGGTG CAGGTTTTTC GATGGCATAA 60 

TATCTGCGTT GCGACGTGTA ACACACTATT GGAGACATAT CATGCAAACG CTCAGCATCC 120 

AGCACGGTAC CCTCGTCACG ATGGATCAGT ACCGCAGAGT CCTTGGGGAT AGCTGGGTTC 180 

ACGTGCAGGA TGGACGGATC GTCGCGCTCG GAGTGCACGC CGAGTCGGTG CCTCCGCCAG 240 

CGGATCAGGT GATCGATGCA CGCGGCAAGG TCGTGTTACC CGGTTTCATC AATGCCCACA 300 

CCCATGTGAA CCAGATCCTC CTGCGCGGAG GGCCCTCGCA CGGGCGTCAA TTCCATGACT 360 

GGCTGTTCAA CGTTGTGTAT CCGGGACAAA AGGCGATGAG ACCGGAGGAC GTAGCGGTGG 420 

CGGTGAGGTT GTATTGTGCA GAAGCTGTGC GCAGCGGGAT TACGACGATT AACGAAAACG 480 

CCGATTCGGC CATCTACCCA GGCAACATCG AGGCCGCGAT GGCGGTCTAT GGTGAGGTGG 540 

GTGTGAGGGT CGTCTACGCC CGCATGTTCT TTGATCGGAT GGACGGGCGC ATTCAAGGGT 600 

ATGTGGACGC CTTGAAGGCT CGCTCTCCCC AAGTCGAACT GTGCTCGATC ATGGAGGAAA 660 

CGGCTGTGGC CAAAGATCGG ATCACAGCCC TGTCAGATCA GTATCATGGC ACGGCAGGAG 720 

GTCGTATATC AGTTTGGCCC GCTCCTGCCA CTACCACGGC GGTGACAGTT GAAGGAATGC 780 

GATGGGCACA AGCCTTCGCC CGTGATCGGG CGGTAATGTG GACGCTTCAC ATGGCGGAGA 840 

GCGATCATGA TGGGCGGATT CATGGGATGA GTCCCGCCGA GTACATGGAG TGTTACGGAC 900 

TCTTGGATGA GCGTCTGCAG GTCGCGCATT GCGTGTACTT TGACCGGAAG GATGTTCGGC 960 

TGCTGCACCG CCACAATGTG AAGGTCGCGT CGCAGGTTGT GAGCAATGCC TACCTCGGCT 1020 

CAGGGGTGGC CCCCGTGCCA GAGATGGTGG AGCGCGGCAT GGCCGTGGGC ATTGGAACAG 1080 

ATAACGGGAA TAGTAATGAC TCCGTAAACA TGATCGGAGA CATGAAGTTT ATGGCCCATA 1140 

TTCACCGCGC GGTGCATCGG GATGCGGACG TGCTGACCCC AQAGAAGATT CTTGAAATGG 1200 

CAACGATCGA TGGGGCGCGT TCGTTGGGAA TGGACCACGA GATTGGTTCC ATCGAAACCG " '1260 

GCAAGCGCGC GGACCTTATC CTGCTTGACC TGCGTCACCC TCAGACGACT CCTCACCATC 1320 

' ATTTGGCGGC CACGATCGTG TTTCAGGCTT ACGGCAATGA GGTGGACACT GTCCTGATTG -'l38'0 
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.cco^cc. ;.ccoccoc. xc.ocx^cx xcccccroM cox„ 

..^,eoc.o .occocccc c.c™ .e.ccoooco ^„ 
cr^cccoc .occccoc ™^ ^^^^^^^^^ 

AGGAAATCGC TGCCATCTTG GCGCGGCTCG GATTGGG 
(2) INFORMATION FOR SEQ ID NO ••21: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 1674 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



1440 

1500 
1560 
1597 



(Xi) SEQUENCE DESCRIPTION: SEQ ID N0:21: 
.TGACCT.GA CGCCGCTCTT I.CGTTCTCT ..GTTGAACT GCACGCGAAT GGC^CCAGT 
TCGAXGATGT CCGCGTCGTC GTGGTTGGTG ACGTCCGCGA T..CCACCCA GTCGCGGTGC 
acg™ ATGGCATAAT ATCTGCG^G CGACGTGTAA CACACTATTC GAGACATATC 
A^CAAACGC TCAGCATCCA GCACGGTACC CTCGTCACGA TGGATCAGTA CCGCAGAGTC 

cxtggggata gctgggttca cgtgcaggat ggacggatcg TCOCGCTCG AG„c 

CAGTCGGTGC CTCCGCCAGC GGATCGGGTG AI.GATGCAC GCGGCAAGGT CGTG^ACCC 

™tca atgcccacac ccatgtgaac cagatcctcc tgcgcggagg cctcgcacgg 
OCGTCAA.C tatgactggc tgttcaacgt tgtgtatccg ggacaaaagg cgatgagacc 

CGAGGACGTA GCGGTGGCGG TGAGGTTGTA TTGTGCGGAA GCTG^CGCA GCGGGAIXAC 

oacgatcaac gaaaacgccg attcggccat ctacccaggc aacatcgagg ccgcga^c 

COTCTATGGT GAGGTGGGTG „CGT CTACGCCCGC A^XXCTTTG ATCGGATGGA 

caggcgca^ caagggtatg tggacgcctt gaaggctcgc tctccccaag tcgaactgtg 

CXCGATCATG GAGGAAACGG CTGTGGCCAA AGATCGGATC ,CAGCCCT.T CAGAXCAGTA 
TCAXGGCACG GCAGGAGGTC GTAXAXCAGT ^GGCCCGCT CCX.CCACTA CCACGGCCGX-^^ 
OACAGXXGAA OGAA^CGAX GGGCACAAGC CTX.GCCCGX^OAXcbOGCGG 
CCXXCACATG GCGGAGAGCG AXCAXGA^ GCGGAXXCAT GGGAX^GTC CCGCCX^^^ 
. CAI.GAGXGX ^XGCAGGXC CCGCAX^O. ,CX;^C^^ 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840* 

1020 
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CCGGAAGGAT ATTCGGCTGC TGCACCGCCA CAATGTGAAG GTCGCGTCGC AGGCTGTGAG ' 1030 

CAATGCCTAC CTCGGCTCAG GGGTGGCCCC CGTGCCAGAG ATGGTGGAGC GCGGCATGGC 1140 

CGTGGGCATT GGAACAGATA ACGGGAATAG TAATGACTCC GTAAACATGA TCGGAGACAT 1200 

GAAGTTTATG GCCCATATTC ACCGCGCGGT GCATCGGGAT GCGGACGTGC TGACCCCAGA 1250 

GAAGATTCTT GAAATGGCGA CGATCGATGG GGCGCGTTCG TTGGGAATGG ACCACGAGAT 1320 

TGGTTCCATC GAAACCGGCA AGCGCGCGGA CCTTATCCTG CTTGACCTGC GTCACCCTCA 1380 

GACGACTCCT ,CAuCATCATT TGGCGGCCAC GATCGTGTTT CAGGCTTACG GCAATGAGGT 1440 

GGACACTGTC CTGATTGACG GAAACGTTGT GATGGAGAAC CGCCGCTTGA GCTTTCTTCC 1500 

CCCTGAACGT GAGTTGGCGT TCCTTGAGGA AGCGCAGAGC CGCGCCACAG CTATTTTGCA 1560 

GCGGGCGAAC ATGGTGGCCA ACCCAGCTTG GCGCAGCCTC TAGGAAATGA CGCCGTTGCT 1620 

GCATCCGCCG CCCCTTGAGG AAATCGCTGC CATCTTGGCG CAGCTCGGAT TGGG 1S74 
(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 496 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:22: 

Met Gin Thr Leu Ser He Gin His Gly Thr Leu Val Thr Met Asp Gin 
15 10 15 

Tyr Arg Arg Val Leu Gly Asp Ser Trp Val His Val Gin Asp Gly Arg 
20 25 30 

He Val Ala Leu Gly Val His Ala Lys Ser Val Pro Pro Pro Ala Asp 
35 40 45 

Arg Val He Asp Ala Arg Gly Lys Val Val Leu Pro Gly Phe He Asn 
50 55 60 

Ala His Thr His Val Asn Gin He Leu Leu Arg Gly Gly Pro Ser His 

65 70 75 ■ ; * >°.-- 



Gly Arg Gin Phe Tyr Asp Trp Leu Phe AsnfVal Val Tyr Pro Gly 'Gin 
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"° JJ^ V. ^„ ^ 

110 

"a o.„ V. ^^^^ 

«a ne ^„ ^^^^^ 

- V. ^ 

160 

-X n, ^ 

- - 

190 

ne ^ 

205 

^ - "a 
He. ^ ^ 

- «. 0. ^ ^ 

255 

Ser Pro Ala Glu Tyr Met Glu Cys Tvr Glv t t 

260 I-eu Asp Glu Arg Leu 

270 

Gin Val Ala His Cys Val Tvr dj,^ * 

cy val ivr Phe Asp Arg Lya Asp Val Arg Leu Leu 

285 

- va. vax s.. ^„ 

30S ^" - - V. «u r „^ 

■^■^^ 320 



-a Val Oly Xle Gly ... ^„ 

"° 335 
Met He Gly Asp Met Lys Phe Met Ala His ti 

340 ' ^ "^^ ^la His He His Arg Ala Val Hi 



345 - -~; 

350 

..p v.. ... „^ ^^^^ 

J- - ox, .,3 „^ 

400 
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Gin Thr Thr Pro His His His Leu Ala Ala Thr lie Val Phe Gin Ala 
405 4X0 415 

Tyr Gly Asn Glu Val Asp Thr Val Leu He Asp Gly Asn Val Val Met 
420 425 430 

Glu Asn Arg Cys Leu Ser Phe Leu Pro Pro Glu Arg Glu Leu Ala Phe 
435 440 445 

Leu Glu Gly Ala Gin Ser Arg Ala Thr Ala He Leu Gin Arg Ala Asn 
450 455 460 

Met Val Ala Asn Pro Ala Trp Arg Ser Leu Glu Met Thr Pro Leu Leu 
465 470 475 480 

His Pro Pro Pro Leu Glu Glu He Ala Ala He Leu Ala Arg Leu Gly 
485 490 495 



(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 496 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:23: 

Met Gin Thr Leu Ser He Gin His Gly Thr Leu Val Thr Met Asp Gin 



10 



15 



Tyr Arg Arg Val Leu Gly Asp Ser Trp Val His Val Gin Asp Gly Arg 
20 25 30 

He Val Ala Leu Gly Val His Ala Glu Ser Val Pro Pro Pro Ala Asp 
35 40 45 

Arg Val He Asp Ala Arg Gly Lys Val Val Leu Pro Gly Phe He Asn 
50 55 60 

Ala His Thr His Val Asn Gin He Leu Leu Arg Gly Gly Pro Ser His 



65 



70 



75 



80 



Gly Arg Gin Phe Tyr Asp Trp Leu Phe Asn Val Val Tyr Pro Gly Gin 

■ 85, ■ ■ ■ ■ 90;- 95 J ' ; 

Lys Ala Met Arg Pro Glu Asp Var Ala Val Ala' Val Arg ^ 
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Ala Glu Ala Val Arg Ser Gly He Thr Thr r^ \ 

120 "^^^ '^^^ Ala Asp 

" 125 

n. „^ 

- - V. 

o.. ^ - 

0- V. 0. ^„ „^ - 

190 

Arg lie Thr Ala Leu Ser Asd cin ^ „• 

ASP Gin lyr Hxs Gly Thr Ala Gly Gly Arg 

205 



"e S„ V. „^ ^ ^ 

220 

"et „, „^ ^ 

- - ^ - 

255 

'V. Oa„ „.s 3., 

- V. 

285 

His Arg His Asn Val Lvs Val ai o 

Lys val Ala ser Gin Val Val Ser Asn Ala Tyr 

300 

- c.. 3e. OX, V. 

"a M,. - 

"° 335 

ne o.. J., ^ 

Ma V. . „^ 

365 

"e JSP ^ 3„ ^^^^ 

«u XH. ^ ^„ ^^^^ ^ _ 

• •■■ -^-^ ■-. ... 400 
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Gin Thr Thr Pro His His His Leu Ala Ala Thr He Val Phe Gin Ala 
405 410 415 

Tyr Gly Asn Glu Val Asp Thr Val j -^ He Asp Gly Asn Val Val Met 
420 425 430 

Glu Asn Arg Arg Leu Ser Phe Leu Pro Pro Glu Arg Glu Leu Ala Phe 
435 440 445 

Leu Glu Glu Ala Gin Ser Arg Ala Thr Ala He Leu Gin Arg Ala Asn 
450 455 460 

Met Val Ala Asn Pro Ala Trp Arg Se^ Leu Glu Met Thr Pro Leu Leu 
465 470 475 480 

His Pro Pro Pro Leu Glu Glu He Ala Ala He Leu Ala Arg Leu Gly 
485 490 495 



(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQOTNCE CHARACTERISTICS: 

(A) LENGTH: 496 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:24: 

Met Gin Thr Leu Ser He Gin His Gly Thr Leu Val Thr Met Asp Gin 
15 10 15 

Tyr Arg Arg Val Leu Gly Asp Ser Trp Val His Val Gin Asp Gly Arg 
20 25 30 

He Val Ala Leu Gly Val His Ala Glu Ser Val Pro Pro Pro Ala Asp 
35 40 45 

Arg Val He Asp Ala Arg Gly Lys Val Val Leu Pro Gly Phe He Asn 
50 55 60 

Ala His Thr His Val Asn Gin He Leu Leu Arg Gly Gly Pro Ser His 
65 70 75 80 

Gly Arg Gin Phe Tyr Asp Trp Leu Phe Asn Val Val Tyr Pro Gly -Gin 
85 90 95 

Lys'Ala Met Arg Pro Glu Asp Val Ala Val Ala Val Arg Leu Tyr Cys^' ' 
100 105 . 110 ^ 
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"a J.. se. c., ^„ ^^^^ 

«. ne T.. P„ „^ ^ ^ 

140 



Glu Val Gly Val Arg Val Val Tv- * . 

"5 ""^^ Met Phe Phe Asp Arg Met 

'^^^ 160 

ci. ^ ne ai„ 01, V,. ^„ 



170 



Gin Val Glu Leu Cvs s^^r m 

c:u uys £,er lie Met Glu Glv rhr Als v^i tm . 
180 - Ala Lys Asp 

190 

Arg lie ^Thr Ala Leu Ser Asp Gin Tyr His Glv Thr ai 

195 ^ ^ "^^^ Ala Gly Gly Arg 

"^"^ 205 

XXe S« V,: ..p 

220 

Gly Met Arg Trp Ala Gin Ala Phe Ala Aye , 

225 230 ^ Ala Val Met Trp 

240 



- HU «e. 

255 

ser P„ «, OX. ^ oxu olv ^„ ^ ^„ 

«n vax ^ 



280 285 



His Ar. His M„ V.1 val M. ser ol„ v,l vaX Ser .s„ Ty. 

"^^^ 300 
- o., .e, 01, V,. ^^^^ 

"a V.X OX, OX. oX, se. ^ 

XXe OX, „e. .,3 

350 

Arg Asp Ala Asp Val Leu Thr Pro Glu Lvs ri« t ^, 

355 J° ^-^^ Leu Glu Met Ala Thr 

365 

lie Asp Gly Ala Arg Ser Leu Gly Met Asd Hi^ n m , 

. 370 ^ "^"^ Glu lie Gly Ser lie 

;■ J/s 380 

,.x„ „^ ^ ^^.^^^^ 
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Tyr Gly Asn Glu Val Asp Thr Val 
420 

Glu Asn Arg Arg Leu Ser Phe Leu 
435 440 

Leu Glu Glu Ala Gin Ser Arg Ala 
450 455 

Met Val Ala Asn Pro Ala Trp Arg 
465 470 

His Pro Leu Pro Leu Glu Glu lie 
485 



65 

Leu lie Asp Gly Asn Val Val Met 
425 430 

Pro Pro Glu Arg Glu Leu Ala Phe 
445 

Thr Ala lie Leu Gin Arg Ala Asn 
460 

Ser Leu Glu Met Thr Pro Leu Leu 
475 480 

Ala Ala lie Leu Ala Arg Leu Gly 
490 495 



(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 496 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 

Met Gin Thr Leu Ser He Gin His Gly Thr Leu Val Thr Met Asp oln 
15 10 15 

Tyr Arg Arg Val Leu Gly Asp Ser Trp Val His Val Gin Asp Gly Arg 
20 25 30 

He Val Ala Leu Gly Val His Ala Glu Ser Val Pro Pro Pro Ala Asp 
35 40 45 

Gin Val He Asp Ala Arg Gly Lys Val Val Leu Pro Gly Phe He Asn 
50 55 60 

Ala His Thr His Val Asn Gin He Leu Leu Arg Gly Gly Pro Ser His 
65 70 75 80 

Gly Arg Gin Phe His Asp Trp Leu Phe Asn Val Val Tyr Pro Gly Gin 
85 90 95 

Lys Ala Met Arg Pro Glu Asp Val Ala Val Ala Val Arg Leu Tyr Cys " 
100 105 110 

Ala' Glu -Ala Val Arg Ser Gly Vile Thr Thr He Asn Glu Asn Ala Asp ■ 

X15 120 ; : : , : . . 125 
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"e ^ ^^^^ ^ 

- - - - V. ;;: 

X55 

Asp Gly Arg He Gin Gly Tvr v.i ^ 

ly IVr Val Asp Ala l,eu Lys Ala Arg Ser Pro 

175 

- - - - « 

ISO 

- - - U„ s„ 

- ^ - - - - ... v. 

220 

Gly Met Arg Trn Ala rir, ai ^ 

P Ala p.. «3 Ar, 

Thr Leu' His Met Ala n,, o . 

«. .lu se, , ^ 

- - ^ ^ ^ 

^- - - V. , 

285 

His Arg His Asn Val Lvs v^.i . 

I-ys val Ala ser Gin Val Val Ser Asn Ala Tyr 

300 

«a V. o.. „e 0.. .p ^„ 
ne j,p ^^^^^ 
-p ^p ^^^^ ^^^^ 

"e ^p ^ 2 

380 

- - ox. ^ ^„ .^^ 

~: V 415 *' :'. j ! . 

... " ... .430.- :.:.v:;).x;;:;ji:^2: . 



330 - - 

335 
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Glu Asn Arg Arg Leu Ser Phe Leu Pro Pro Glu Arg Glu Leu Ala Phe 
435 440 445 

Leu Glu Glu Ala Gin Ser Arg Ala Thr Ala He Leu Gin Arg Ala Asn 
450 455 460 

Met Val Ala Asn Pro Ala Trp Arg Ser Leu Glu Met Thr Pro Leu Leu 
465 470 475 480 

His Pro Pro Pro Leu Glu Glu He Ala Ala He Leu Ala Arg Leu Gly 
485 490 495 



2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 496 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:26: 

Met Gin Thr Leu Ser He Gin His Gly Thr Leu Val Thr Met Asp Gin 
15 10 15 

Tyr Arg Arg Val Leu Gly Asp Ser Trp Val His Val Gin Asp Gly Arg 
20 25 30 

He Val Ala Leu Gly Val His Ala Glu Ser Val Pro Pro Pro Ala Asp 
35 40 45 

Arg Val He Asp Ala Arg Gly Lys Val Val Leu Pro Gly Phe He Asn 
50 55 60 

Ala His Thr His Val Asn Gin He Leu Leu Arg Gly Gly Pro Ser His 
65 70 75 80 

Gly Arg Gin Phe Tyr Asp Trp Leu Phe Asn Val Val Tyr Pro Gly Gin 
85 90 95 

Lys Ala Met Arg Pro Glu Asp Val Ala Val Ala Val Arg Leu Tyr Cys 
100 105 HO 

Ala Glu Ala Val Arg Ser Gly He Thr Thr He Asn Glu Asn Ala Asp 
115 120 125 

Ser 'Aia" lie'. Tyr Pro Gly Asn He ;Glu Ala Ala Met Ala Val Tyr Gly 

130 . " . . .135, ". ^.;/,;:i40;; . . 
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Olu Val Oly Val Arg Val Val Ala Arg Met Phe Phe Asp Arg Met 

160 

ASP Arg Arg Ue Oln Gly Val Asp Ala .eu Ly. Ala Arg Ser Pro 

170 

Oln val oiu Leu Cys Ser He Met Glu Glu Thr Ala Val Ala .ys Asp 

190 

Arg lie Thr Ala Leu Ser Asp Gin Tyr His Gly Thr Ala Gly Gly Arg 

205 

lie ser Val Trp Pro Ala Pro Ala Thr Thr Thr Ala Val Thr Val Glu 



215 



220 



Gly Met Arg Trp Ala Gin Ala Phe Ala Arg Asp Arg 



230 



235 



Ala Val Met Trp 



240 



Thr Leu His Met Ala Glu Ser Asp His Asp Glu Arg He His Gly Met 

250 255 



Leu 



ser Pro Ala Glu Tyr Met Glu Cys Tyr Gly Leu Leu Asp Glu Arg 

265 270 
Gin val Ala His Cys Val Tyr Phe Asp Arg Lys Asp He Arg Leu Leu 



260 



285 



His Arg His Asn Val Lys Val Ala Ser Gin Ala 



290 



295 



Val Ser Asn Ala Tyr 



300 



^eu Gly ser Gly Val Ala Pro Val Pro Glu Met Val Glu Arg Gly Met 

315 

Ala val Gly He Gly Thr Asp Asn Gly Asn Ser Asn Asp Ser Val Asn 



330 



335 



Met He Gly Asp Met Lys Phe Met Ala His He Hi 



340 



345 



His Arg Ala Val His 
350 



Arg Asp Ala Asp Val Leu Thr Pro Glu Lys He Leu Glu Met Ala Thr 



360 



He Asp Gly Ala Arg Ser Leu Gly Met Asp His Glu 



370 



375 



380 



Glu Thr Gly Lys Arg Ala Asp Leu He Leu Leu Asp 



390 



395 



365 



He Gly Ser He 



Leu Arg His Pro 
400 



Gin Thr Thr Pro His His His Leu Ala Ala 



405 



410 



Thr He Val Phe Gin Ala 



415 



Tyr Gly Asn Glu Val Asp Thr Val Leu He Asp Gly Asn Val' 



- > 430 - 



^Val^Met^ 



Glu. Asn Arg , Arg Leu Ser Phe Leu Pro Pro Glu A^ (Blu x^^^^^ 



440 



445 



Ala Phe. 
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Leu Glu Glu Ala Gin Ser Arg Ala Thr Ala lie Leu Gin Arg Ala Asn 
450 455 460 

Met Val Ala Asn Pro Ala Trp Arg Ser Leu Glu Met Thr Pro Leu Leu 
465 470 475 480 

His Pro Pro Pro Leu Glu Glu lie Ala Ala He Leu Ala Gin Leu Gly 
485 490 495 
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What Is ri»lr^>^^ Iff; 

1. A DNA fragment encoding a homolog of atrazine chJorohydrolase 
and comprising the sequence of SEQ ID N0:3, SEQ ID N0:4, SEQ ID NOS-7. 
11 and SEQ ID NOS: 17-21. 

2. A ^-triazine-degrading protein having at least one amino acid 
different from the protein of SEQ ID N0:2, wherein the coding region of the 
nucleic acid encoding the .-triazine degrading protein has at least 95% homology- 
toSEQIDNO:l and wherein the .-triazine-degrading protein has an altered 
catalytic activity, as compared with the protein having the sequence of SEO ID 
N0:2.- 

3. The protein of Claim 2 wherein the protein is selected from the 
group consisting of SEQ ID NOS: 5, 6 and 22-26. 

4. The protein of Claim 2 wherein the substrate for the .-triazine 
degrading protein is 2-chloro-4-(ethylamino)-6-(isopropylamino)-1.3.5-triazine. 

5. The protein of Claim 2 wherein the substrate for the .-triazine 
degrading protein is 2-chloro-4-(ethylamino)-6-(tertiary butyl-amino)-1.3.5. 
triazine. 

6. The protein of Claim 2 wherem the substrate for the .-triazine 
degrading protein is 2,4,6-triamino-.-triazine. 

7. A protein selected from the group consisting of proteins comprising 
the amino acid sequences of SEQ ID NOS: 5, 6 and 22-26. 



8. A remediation composition comprising a cell producing the protein 
of Claim 2. v' - 
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9. The composition of Claim 8, wherein the composition is suitable 
for treating soil or water. 

1 0. A remediation composition comprising the protein of Claim 2. 

1 1 . The composition of Claim 1 0 wherein the composition is suitable 
for treating soil or water. 

12. The DNA fragment of Claim 1 in an expression vector. 

13. The DNA fragment of Claim 12 in a cell. 

14. The DNA fragment of Claim 13 wherein the cell is a bacterium. 

15. The DNA fragment of Claun 14 wherein the cell is E, coli. 

1 6. A DNA jfragment having a portion of its nucleic acid sequence as 
having at least 9i% homology to a DNA fragment consisting of position 236 and 
ending at position 1655 of SEQ ID N0:1, wherein the DNA fragment is capable 
of hybridizing under stringent conditions to SEQ ID N0:1 and wherein there is 
at least one amino acid change in the protein encoded by the DNA fragment as 
compared with ESQ ID N0:2 and wh^r-ein the protein encoded by the DNA 
fragment is capable of dechlorinating at least one ^-triazine-containing 
compound and has an enzymatic activity different from the enzymatic activity of 
the protein corresponding to SEQ ID N0:2. 

17. The fragment of Claim 1 6, wherein the j-triazine-containing 
compound is 2-chloro-4-(ethylamino)-6-(isopropylamino)-l,3,5-tria2ine. 

18. Thefragmentof Claim 16, wherein the ^-triazine-containing 

- c<)mp6und is 2-chloro-4-(ethylaniino)-6-(tertiary butyl-amino)-l^,5-triaaiie^^ 
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19. The fragment of Claim 16. wherein the j-triazi 



compound is (2.4,6-triamino-j-tria2ine). 



triazme containing 



20. The fragment of Claim 16 wherein the enzymatic activity is an 
improved ability to degrade atrazine. 



21. The fragment of Claim 20 wherein the enzymatic activity is a ] 0- 
fold improvement in the ability to degrade atrazine. 

22. The fragment of Claim 16, wherein the enzymatic activity is an 
altered substrate. 



23. The protein ofClahn 2 which is a homotetramer. 

24. The protein of Claim 2 bound to an immobilization support. 

25. A method for treating a sample comprising an J-triazine-containing 
compound comprising the step of: 

adding a composition to a sample comprising an ^-triazine- 
containing compound, wherein the composition comprises a protein 
encoded by a gene having at least a portion of the nucleic acid 
sequence of the gene having at least 95% homology to the 

sequence begimiing at position 236 and ending at position 1655 of 
SEQ ID N0:1, wherein the gene is capable of hybridizing under 
stringent conditions to SEQ ID N0:1, wherein there is at least one 
amino acid change in the protein. encoded by the DNA fragment as 
compared with SEQ ID N0:2 and wherem the protein has an 

altered catalytic acitivity as compared to the protein having the 
■H-t amino acid sequence of SEQ ID N0:2. : : 
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26. The method of Claim 25 wherein the composition comprises 
bacteria expressing the protein. 

27. The method of Claim 25 wherein the ^-triazine -containing 
compound is 2"Ch]oro-4-(ethylamino)-6-(isopropylamino)-l ,3,5-triazine. 

28. The method of Claim 25 wherein the ^-triazine-containing 
compound is 2-chloro-4-(ethylamino)-6-(tertiary butyl-amino>l,3,5-tria2ine. 

29. The method of Claim 25 wherein the ^-triazine containing 
compound is (2,4,6-triamino-j-tria2ine). 

30. The method of Claim 25 wherein the protein encoded by the gene 
is selected from the group consisting of SEQ ID NOS: 5, 6 and 22-26. 

31. A method for obtaining homologs of an atrazine chlorohydrolase 
comprising the steps of: 

obtaining a nucleic acid sequence encoding atrazine 
chlorohydrolase; 

mutagenizing the nucleic acid to obtain a modified nucleic 
acid sequence that encodes for a protein having an amino acid 
sequence with at least one amino acid change relative to the amino 
acid sequence of the atrazine chlorohydrolase; 

screening the protems encoded by the modified nucleic acid 
sequence; and 

selecting proteins with altered catalytic activity as 
compared to the catalytic activity of the atrazine chlorohydrolase. 

32. The method of Claim 3 1 wherein the atrazine chlorohydrolase 
nucleic acid sequence is SEQ ID Nb:l . 
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33. The method of Claim 31 wherein the altered catalytic activity is an 
improved ability to degrade atrazine. 



34. 



altered substrate activity. 



The method of Claim 3 1 wherein the selected proteins have 



an 
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1 GCGA 4 

nil 

1 CTCGGGTAACTTCTTGAGCGCGGCCACAGCAGCCTTGATCATGAAGGCGA 50 
5 GCATGGTGACCTTGACGCCGCTCTTTTCGTTCTCTTTGTTGAACTGCACG 54 

IIIMIIIIIIIilllilllllMIIIMIIIIIIIIIIIIIIIIIIIil 

51 GCATGGTGACCTTGACGCCGCTCTTTTCGTTCTCTTTGTTGAACTGCACG 100 
55 CGAAAGGCTTCCAGGTCGGTGATGTCCGCGTCGTCGTGGTTGGTGACGTG 104 

IlilillllllllilllllllllMlllllllllllllllillllMIII 

101 CGAAAGGCTTCCAGGTCGGTGATGTCCGCGTCGTCGTGGTTGGTGACGTG 150 
105 CGGGATGACCACCCAGTTGCGGTGCAGGTTTTTCGATGGCATAATATCTG 154 

lllllllillllllllllllllllllllllllliilillilMlilllll 

151 CGGGATGACCACCCAGTTGCGGTGCAGGTTTTTCGATGGCATAATATCTG 200 
155 CGTTGCGACGTGTAACACACTATTGGAGACATATCATGCAAACGCTCAGC 204 

llllllllllllillilillllllillllllllllllllllllllllill 

201 CGTTGCGACGTGTAACACACTATTGGAGACATATCATGCAAACGCTCAGC 250 
205 ATCCAGCACGGTACCCTCGTCACGATGGATCAGTACCGCAGAGTCCTTGG 254 

IIMIIillllMIMIIMIIIIillllMIIMilMIIIIMIIIil 

251 ATCCAGCACGGTACCCTCGTCACGATGGATCAGTACCGCAGAGTCCTTGG 300 

• • • • , 
255 GGATAGCTGGGTTCACGTGCAGGATGGACGGATCGTCGCGCTCGGAGTGC 304 

lllllllllllllllllllllllilllllllllllllillllllllllii 

301 GGATAGCTGGGTTCACGTGCAGGATGGACGGATCGTCGCGCTCGGAGTGC 350 

• * > . , 

305 ACGCCGAGTCGGTGCCTCCGCCAGCGGATCGGGTGATCGATGCACGCGGC 354 

IIIIIIIIIIMIIIIIIIIIIIIIIIIMIIIIIIIMIIIIIIMIII 

3 51 ACGCCGAGTCGGTGCCTCCGCCAGCGGATCGGGTGATCGATGCACGCGGC 400 

• • • • • 
355 AAGGTCGTGTTACCCGGTTTCATCAATGCCCACACCCATGTGAACCAGAT 404 

MlllllllllllllllllilllllllllllllillllllllllMilll 

401 AAGGTCGTGTTACCCGGTTTCATCAATGCCCACACCCATGTGAACCAGAT 450 
405 CCTCCTGCGCGGAGGGCCCTCGCACGGGCGTCAATTCTATGACTGGCTGT 454 

llllllMlllillllMlllllllll illllllllM Mill i Mill 

451 CCTCCTGCGCGGAGGGCCCTCGCACGGACGTCAATTCTATGACTGGCTGT 500 

4 55 TCAACGTTGTGTATCCGGGACAAAAGGCGATGAGACCGGAGGACGTAGCG 504 

IIIIIMMIIIIIIIllllllllllllllllllllllllllltllllM 

501 TCAACGTTGTGTATCCGGGACAAAAGGCGATGAGACCGGAGGACGTAGCG 550 
505 GTGGCGGTGAGGTTGTATTGTGCGGAAGCTGTGCGCAGCGGGATTACGAC 554 

MIIIIIIMIIilllMIIIIMMIIIMnillllllllllllllll 

551 GTGGCGGTGAGGTTGTATTGTGCGGAAGCTGTGCGCAGCGGGATTACGAC 600 

555 GATCAACGAAAACGCCGATTCGGCCATCTACCCAGGCAACATCGAGGCCG 604 

II I I I I i I I 1 II I M I I I II II II II 11 II I M II I M II II I I II II I I 4 
601 GATCAACGAAAACGCCGATTCGGCCATCTACCCAGGCAACATCGAGGCCG 650' 
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605 CGATGGCGGTCTATGGTGAGGTGGGTGTGAGGGTCGTCTACGCCCGCATG 654 

llllllllllllliilMlllillMlllillllMMIIIMIIIIIII 

651 CGATGGCGGTCTATGGTGAGGTGGGTGTGAGGGTCGTCTACGCCCGCATG 700 
655 TTCTTTGATCGGATGGACGGGCGCATTCAAGGGTATGTGGACGCCTTGAA 704 

lillillMlllliilllllllllMllllililMMIilillllllll 

701 TTCTTTGATCGGATGGACGGGCGCATTCAAGGGTATGTGGACGCCTTGAA 750 

• • • • . 

705 GGCTCGCTCTCCCCAAGTCGAACTGTGCTCGATCATGGAGGGAACGGCTG 754 

lllllllllllillllMlllilllllillllMllillll llllllll 

751 GGCTCGCTCTCCCCAAGTCGAACTGTGCTCGATCATGGAGGAAACGGCTG 800 
755 TGGCCAAAGATCGGATCACAGCCCTGTCAGATCAGTATCATGGCACGGCA 804 

lllllliillilMIIIIIMIMIIillllllllllllllllllMIII 

801 TGGCCAAAGATCGGATCACAGCCCTGTCAGATCAGTATCATGGCACGGCA 850 
805 GGAGGTCGTATATCAGTTTGGCCCGCTCCTGCCACTACCACGGCGGTGAC 854 

liillilllliillillilMllillllilillMlliilllllllllll 

851 GGAGGTCGTATATCAGTTTGGCCCGCTCCTGCCACTACCACGGCGGTGAC 900 

855 AGTTGAAGGAATGCGATGGGCACAAGCCTTCGCCCGTGATCGGGCGGTAA 904 

llllllllilllllllllllllllllllllllllllllllll llllllll 
901 AGTTGAAGGAATGCGATGGGCACAAGCCTTCGCCCGTGATCGGGCGGTAA 950 

905 TGTGGACGCTTCACATGGCGGAGAGCGATCATGATGAGCGGATTCATGGG 954 

lliliiliiiiilillllllilililililliliiililllliiliiili 

951 TGTGGACGCTTCACATGGCGGAGAGCGATCATGATGAGCGGATTCATGGG 1000 
955 ATGAGTCCCGCCGAGTACATGGAGTGTTACGGACTCTTGGATGAGCGTCT 1004 

IlillMIMIIIIIIIIilllllllllllllllllilllllllilllll 

1001 ATGAGTCCCGCCGAGTACATGGAGTGTTACGGACTCTTGGATGAGCGTCT 1050 
1005 GCAGGTCGCGCATTGCGTGTACTTTGACCGGAAGGATGTTCGGCTGCTGC 1054 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIMIIIIIII 

1051 GCAGGTCGCGC^TTGCGTGTACTTTGACCGGAAGGATGTTCGGCTGCTGC 1100 
1055 ACCGCCACAATGTGAAGGTCGCGTCGCAGGTTGTGAGCAATGCCTACCTC 1104 

lllllllllillllllllllllllilllllllllllllllllllllllll 

1101 ACCGCCACAATGTGAAGGTCGCGTCGCAGGTTGTGAGCAATGCCTACCTC 1150 

• . . , 

1105 GGCTCAGGGGTGGCCCCCGTGCCAGAGATGGTGGAGCGCGGCATGGCCGT 1154 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIII 

1151 GGCTCAGGGGTGGCCCCCGTGCCAGAGATGGTGGAGCGCGGCATGGCCGT 1200 

• • • • , 

1155 GGGCATTGGAACAGATAACGGGAATAGTAATGACTCCGTAAACATGATCG 1204 

lllllllllllllllllllllllllllllllllillll lllllllllll 
1201 GGGCATTGGAACAGATAACGGGAATAGTAATGACTCCGCAAACATGATCG 1250 

1205 GAGACATGAAGTTTATGGCCCATATTCACCGCGCGGTGCATCGGGATGCG 1254 ■ 

lllllllillllliililllliillllMllllllliliililllll|||:::.v:/ 

1251 GAGACATGAAGTTTATGGCCCATATTCACCGCGCGGTGCATCGGGATGCG^i'BOOl:- 



3l9 lB 
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1255 GACGTGCTGACCCCAGAGAAGATTCTTGAAATGGCGACGATCGATGGGGC 1304 

MIIMMIMIIIIIMIIilllilMlllllllllllllMIIIIIM 

1301 GACGTGCTGACCCCAGAGAAGATTCTTGAAATGGCGACGATCGATGGGGC 1350 
1305 GCGTTCGTTGGGAATGGACCACGAGATTGGTTCCATCGAAACCGGCAAGC 1354 

IIIIIIIIMIIIIMIIIIIMIIIIMMMMIIIMIIIIIIIMI 

1351 GCGTTCGTTGGGAATGGACCACGAGATTGGTTCCATCGAAACCGGCAAGC 1400 
1355 GCGCGGACCTTATCCTGCTTGACCTGCGTCACCCTCAGACGACTCCTCAC 1404 

llllillllllilllllMlilllllIMM lllllllllllll 

1401 GCGCGGACCTTATCCTGCTTGACCTGCGTCA.CCTCAGACGACTC. .TCA 144 7 
1405 CATCATTTGGCGGCCACGATCGTGTTTCAGGCTTACGGCAATGAGGTGGA 1454 

ilMIIIIIMIilllllllllllllllllllMIIIIIIIIIIIIMIi 

1448 CATCATTTGGCGGCCACGATCGTGTTTCAGGCTTACGGCAATGAGGTGGA 1497 
1455 CACTGTCCTGATTGACGGAAACGTTGTGATGGAGAACCGCCGCTTGAGCT 1504 

MllllillllMIIIIIMIillMIIIMIIMIIIIMIIIIIIIIi 

1498 CACTGTCCTGATTGACGGAAACGTTGTGATGGAGAACCGCCGCTTGAGCT 1547 
1505 TTCTTCCCCCTGAACGTGAGTTGGCGTTCCTTGAGGAAGCGCAGAGCCGC 1554 

IMIIIilllilllillllllMIIIMIillllllllllllillllMI 

1548 TTCTTCCCCCTGAACGTGAGTTGGCGTTCCTTGAGGAAGCGCAGAGCCGC 1597 
1555 GCCACAGCTATTTTGCAGCGGGCGAACATGGTGGCTAACCCAGCTTGGCG 1604 

MIIMIIIilillliilllllllllMIIIIIIIIMIIMIIIIIIII 

1598 GCCACAGCTATTTTGCAGCGGGCGAACATGGTGGCTAACCCAGCTTGGCG 1647 
1605 CAGCCTCTAGGAAATGACGCCGTTGCTGCATCCGCCGCCCCTTGAGGAAA 1654 

llllillllllllilllilllililllllMiiliillllllllllllll 

1648 CAGCCTCTAGGAAATGACGCCGTTGCTGCATCCGCCGCCCCTTGAGGAAA 1697 
1655 TCGCTGCCATCTTGGCGCGGCTCGGATTGGGGGGCGGACATGACCTTGAT 1704 

lllllllllllllllillllllllMllilllMllllllliiiMMII 

1698 TCGCTGCCATCTTGGCGCGGCTCGGATTGGGGGGCGGACATGACCTTGAT 174 "7 
1705 GGATACAGAATTGCCATGAATGCGGCACTTCCGTCCTTCGCTCGTGTGGA 1754 

llilillillillllllllllllllllilillllllllilllllllilll 

1748 GGATACAGAATTGCCATGAATGCGGCACTTCCGTCCTTCGCTCGTGTGGA 1797 
1755 ATCGTTGGTAGGTGAGGGTCGACTGCGGGCGCCAGCTTCCCGAAGAAGTG 1804 

MMMMIIIillMllilllMllllllllllllilllllllll III 

1798 ATCGTTGGTAGGTGAGGGTCGACTGCGGGCGCCAGCTTCCCGAAGAGGTG 1847 



1805 AAAG 1808 



1848 AAAGGCCCGAG 1858 
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1 GAGCGCCGCCACAGCAGCCTTGATCATGAAGGCGA 35 

lliill lllllllllllliillillillllllli 
1 CTCGGGTAACTTCTTGAGCGCGGCCACAGCAGCCTTGATCATGAAGGCGA 50 

36 GCATGGTGACCTTGACGCCGCTCTTTTCGTTCTCTTTGTTGAACTGCACG 85 

lllilllllllllllllllllllilMllliillMNIIilllilllll 

51 GCATGGTGACCTTGACGCCGCTCTTTTCGTTCTCTTTGTTGAACTGCACG 100 
86 CGAAAGGCTTCCAGGTCGGTGATGTCCGCGTCGTCGTGGTTGGTGACGTG 135 

llliNlllllliilMIIMIIIIIIMIIIIIIIillllliiiiiiii 

101 CGAAAGGCTTCCAGGTCGGTGATGTCCGCGTCGTCGTGGTTGGTGACGTG 150 
13 6 CGGGATGACCACCCAGTTGCGGTGCAGGTTTTTCGATGGCGTAATATCTG 185 

iNIIIIMIIIIIIIIIIIIIMIIIIIIIIIIIIIIII llilllill 

151 CGGGATGACCACCCAGTTGCGGTGCAGGTTTTTCGATGGCATAATATCTG 200 
186 CGTTGCGACGTGTAACACACTATTGGAGACATATCATGCAAACGCTCAGC 235 

lilllllllllllllllilllllliillllllllllllllllllllllM 

201 CGTTGCGACGTGTAACACACTATTGGAGACATATCATGCAAACGCTCAGC 250 
236 ATCCAGCACGGTACCCTCGTCACGATGGATCAGTACCGCAGAGTCCTTGG 285 

lilllilllillllllilllllllllMllllllllllillMIIIIIII 

251 ATCCAGCACGGTACCCTCGTCACGATGGATCAGTACCGCAGAGTCCTTGG 300 

• • ' • . 

286 GGATAGCTGGGTTCACGTGCAGGATGGACGGATCGTCGCGCTCGGAGTGC 335 

illlllllllllllllllllllliilllllllllllilllllllilllll 
301 GGATAGCTGGGTTCACGTGCAGGATGGACGGATCGTCGCGCTCGGAGTGC 350 

336 ACGCCGAGTCGGTGCCTCCGCCAGCGGATCGGGTGATCGATGCACGCGGC 385 

IIMMMIIIIIIIIilllllllllllllllllllllllllllllliii 

351 ACGCCGAGTCGGTGCCTCCGCCAGCGGATCGGGTGATCGATGCACGCGGC 400 
386 AAGGTCGTGTTACCCGGTTTCATCAATGCCCACACCCATGTGAACCAGAT 435 

illlllllililMlllllllliiilllillillllllllllMIIMII 

401 AAGGTCGTGTTACCCGGTTTCATCAATGCCCACACCCATGTGAACCAGAT 450 

436 CCTCCTGCGCGGAGGGCCCTCGCACGGGCGTCAATTCTATGACTGGCTGT 49"=" 

I I I I I I i I i I I I I I I i I I i I I i i I I I I I I I I I I I I I I I I I I I I I I I I I I 

451 CCTCCTGCGCGGAGGGCCCTCGCACGGACGTCAAT.TCTATGACTGGCTGT 500 

486 TCAACGTTGTGTATCCGGGACAAAAGGCGATGAGACCGGAGGACGTAGCG 535 

MlilillllllillMIIIIIIIMIIIIIIIIIIIIIlilllllllli 

501 TCAACGTTGTGTATCCGGGACAAAAGGCGATGAGACCGGAGGACGTAGCG 550 
536 GTGGCGGTGAGGTTGTATTGTGCGGAAGCTGTGCGCAGCGGGATTACGAC 585 

MMIIIMIIIIIIIIIIIMIlilMIIIIIIIIIIIIMMIIMII 

551 GTGGCGGTGAGGTTGTATTGTGCGGAAGCTGTGCGCAGCGGGATTACGAC 600 

• • • • • 

586 GATCAACGAAAACGCCGATTCGGCCATCTACCCAGGCAACATCGAGGCCG 635 

lllllllllillllllllMllllltilllillllllllllllllillll 

601 GATCAACGAAAACGCCGATTCGGCCATCTACCCAGGCAACATCGAGGCCG 650' 
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636 CGATGGCGGTCTATGGTGAC3GTGGGTGTGAGGGTCGTCTACGCCCGCATG 685 



651 CGATGGCGGTCTATGGTGAGGTGGGTGTGAGGGTCGTCTACGCCCGCATG 700 
686 TTCTTTGATCGGATGGACGGGCGCATTCAAGGGTATGTGGACGCCTTGAA 735 
701 TTCTTTGATCGGATGGACGGGCG^^ 750 

"^36 GGCTCGCTCTCCCCAAGTCGAACTGTGCTCGATCATGGAGGAAACGGCTG 785 
751 GGCTCGCTCTCCCCAAGTCGAACTGTGCTCGATCATGGAGGAAACGGCTG 800 
786 TGGCCAAAGATCGGATCACAGCCCTGTCAGATCAGTATCATGGCACGGCA 835 
801 TGGCCAAAGATCGGATCACAGCCCTGTCAGATCAGTATCATGGCACGGCA 850 
836 GGAGGTCGTATATCAGTTTGGCCCGCTCCTGCCACTACCACGGCGGTGAC 885 
851 GGAGGTCGTATATCAGTTTGGCCCGCTCCTGCCACTACCACGGCGGTGAC 900 
886 AGTTGAAGGAATGCGATGGGCACAAGCCTTCGCCCGTGATCGGGCGGTAA 935 
901 AGTTGAAGGAATGCGAT<L3GCACAAGC^ 950 

936 TGTGGACGCTTCACATGGCGGAGAGCGATCATGATGAGCGGATTCATGGG 985 
951 TGTGGACGCTTCACaUgcUaUgcG^^ 1000 

• • , 

986 ATGAGTCCCGCCGATTAC^^ 1035 
1001 ATGAGTCCCGCCGAGTACATGGAGTgUaCGGACT 1050 
1036 GCAGGTCGCGCATTGCGTGTACTTTGACCGGAAGGATGTTCGGCTGCTGC 1085 
1051 GCAGGTCGCGCaItGCgUUcUUaCCGGAA 1100 
1086 ACCGCCACAATGTGAAGGTCGCGTCGCAGGTxGTGAGCAATGCCTACCTC 1135 

1 1 i I N i 1 1 1 1 1 1 i i i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

1101 ACCGCCACAATGTGAAGGTCGCGTCGCAGGTTGTGAGCAATGCCTACCTC 1150 

" • , 

1136 GGCTCAGGGGTGGCCCCCGTGCCAGAGATGGTGGAGCGCGGCATGGCCGT 1185 

1151 GGCTCaUgCT(L3CCCCCGTGCCaUg^^ 1200 

1186 GGGCATTGGAACAGATAACGGGAATAGTAATGACTCCGTAAACATGATCG 1235 

1201 GGGCATTGGAACAGATAACGGGAATAGTAATGACTCCGcJ^CATGATCG 1250 

1236 GAGACATGAAGTTTATGGCCCATATTCACCGCGCGGTGCATCGGGATGCG 1285 

1251 GAGACAT<yiGmATUcC<!iTATTC 1300 
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1286 GACGTGCTGACCCCAGAGAAGATTCTTGAAATGGCGACGATCGATGGGGC 1335 

IMIIIIIIIIIIIIIIIIMIIilllllllllllliMIIIIIIIIIII 

1301 GACGTGCTGACCCCAGAGAAGATTCTTGAAATGGCGACGATCGATGGGGC 1350 
1336 GCGTTCGTTGGGGATGGACCACGAGATTGGTTCCATCGAAACCGGCAAGC "385 

MIIMIMMI llllllllllliMMIIIIIMIIIIIIMIIIIIi 

1351 GCGTTCGTTGGGAATGGACCACGAGATTGGTTCCATCGAAACCGGCAAGC 14 00 

13 86 GCGCGGACCTTATCCTGCTTGACCTGCGTCACCCTCAGACGACTCCTCAC '435 

IMMIMIIIMIMMIIIMMIMMI 1 1 1 i 1 1 1 1 1 1 1 1 1 

14 01 GCGCGGACCTTATCCTGCTTGACCTGCGTCA . CCTCAGACGACTC . . TCA 144 7 
14 36 CATCATTTGGCGGCCACGATCGTGTTTCAGGCTTACGGCAATGAGGTGGA * 4 8 5 

lilMIIMIMIIMIIIIIIIiMIIIIIMIIMIIIIIIMIIIII 

1448 CATCATTTGGCGGCCACGATCGTGTTTCAGGCTTACGGCA/i.TGAGGTGGA 14 97 
14 86 CACTGTCCTGATTGACGGAA.z^CGTTGTGATGGAGAACCGCCGCTTGAGC T ^ 5 3 =i 

I'lMMIIIIiillMlilllMIIMIMMMIIIIIMIMMIIir 

14 98 CACTGTCCTGATTGACGGAAACGTTGTGATGGAGAACCGCCGCTTGAGCT 154 7 
1536 TTCTTCCCCCTGAACGTGAGTTGGCGTTCCTTGAGGAAGCGCAGAGCCGC -"585 

IMMIMIMIIIIMIIillMlllllilMIIIMIIIIIlllilM 

1548 TTCTTCCCCCTGAACGTGAGTTGGCGTTCCTTGAGGAAGCGCAGAGCCGC 1597 
1586 GCCACAGCTATTTTGCAGCGGGCGAACATGGTGGCTAACCCAGCTTGGCG 1635 

lilllMIIIIIIMIMMIIIMMIIIIMIIMIIIIMIIIMil 

1598 GCCACAGCTATTTTGCAGCGGGCGAACATGGTGGCTAACCCAGCTTGGCG 164 7 
1636 CAGCCTCTAGGAAATGACGCCGTTGCTGCATCCGCCGCCCCTTGAGGAA^ 1685 

MMIIMIIIIIIMIIMMIMIIMIMIilllllMIIMIIIM 

1648 CAGCCTCTAGGAAATGACGCCGTTGCTGCATCCGCCGCCCCTTGAGGAAP. 1697 
1686 TCGCTGCCATCTTGGCGCGGCTCGGATTGGGGGGCGGACATGACCTTGAT t73 5 

illllMIMMMMIIIMIIMMIMIIIIillllllMIIIIIII 

1698 TCGCTGCCATCTTGGCGCGGCTCGGATTGGGGGGCGGACATGACCTTGAT 174 7 
1736 GGATACAGAATTGCCATGAATGCGGCACTTCCGTCCTTCGCTCGTGTGG^ "785 

MIMIIIIIMIIIIIIMIIMIIIIIillMIMMMIIMMIII 

1748 G<wrnrACAGAATTGCCATGJ^ATGCGGCACTTCCGTCCTTCGCTCGTGTGGA 1797 
1786 ATCGTTGGTAGGTGAGGGTCGACTGCGGGCGCCAGCTTCCCGAAGAAGTG 1835 

IMIIMIIMIIIIMMilMIIIMIIMIIIIIIIMIMII Mi 

17 98 ATCGTTGGTAGGTGAGGGTCGACTGCGGGCGCCAGCTTCCCGAAGAGGTG 1847 

1836 AAAGGCCCGAG 1846 

Mlllilllll 
1848 AAAGGCCCGAG 1858 
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1 ASMVTLTPLFSFSLLNCTRKASRSVMSASSWLVTC 35 

lllillllllllllllllllillllillllllill 

1 SGNFLSAATAALIMKASMVTLTPLFSFSLLNCTRKASRSVMSASSWLVTC 50 
36 GMTTOLRCRFFDGIISALRRVTHYWRHIMQTLSIQHGTLVTMDQYRRVLG 85 

IIIIIIIMIIIMIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

51 GMTTQLRCRFFDGIISALRRVTHYWRHIMQTLSIQHGTLVTMDQYRRVLG 100 

• • • • , 

86 DSWVHVQDGRIVALGVHAESVPPPADRVIDARGKWLPGFINAHTHVNQI 135 

lllliilllllilllllllillMlllllllllillllllllMllllil 

101 DSWVHVQDGRIVALGVHAESVPPPADRVIDARGKVVLPGFINAHTHVNQI 150 

• • . , , , 

136 LLRGGPSHGRQFYDWLFNWYPGQKAMRPEDVAVAVRLYCAEAVRSGITT 185 

IIIMIIIIIIIIIIIIIIIIilillllllllllllllllllllllllll 

151 LLRGGPSHGRQFYDWLFNWYPGQKAMRPEDVAVAVRLYCAEAVRSGITT 200 
186 INENADSAIYPGNIEAAMAVYGEVGVRWYARMFFDRMDGRIQGYVDALK 235 

IIIIIIIIMIIIIMIIIIIillMIIIIIIIIIIIIIIIIIIIIIIII 

201 INENADSAIYPGNIEAAMAVYGEVGVRWYARMFFDRMDGRIQGYVDALK 250 

• * * • • 

236 ARSPQVELCSIMEGTAVAKDRITALSDQYHGTAGGRISVWPAPATTTAVT 285 

IlilllliillihlllMlllilllMilllllllilllllllilllll 

251 ARSPQVELCSIMEETAVAKDRITALSDQYHGTAGGRISVWPAPATTTAVT 300 

• . . , 

286 VEGMRWAQAFARDRAVMWTLHMAESDHDERIHGMSPAEYMECYGLLDERL 335 

1 1 1 i 1 1 1 1 1 1 1 1 1 1 i i 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 M 1 1 1 i 1 1 1 1 1 1 1 1 1 

301 VEGMRWAQAFARDRAVMWTLHMAESDHDERIHGMSPAEYMECYGLLDERL 350 
336 QVAHCVYFDRKDVRLLHRHNVKVASQWSNAYLGSGVAPVPEMVERG^4AV 3 85 

IIIMIIiMlillllillllllllllllMIMIIIMIIIIIIIIIM 

351 QVAHCVYFDRKDVRLLHRHNVKVASQVVSNAYLGSGVAPVPEMVERGMAV 4 00 
386 GIGTDNGNSNDSVNMIGDMKFMAHIHRAVHRDADVLTPEKILEMATIDGA 435 

IIIMIIillll.llllilllllllllllllillillllllllllMIII 

401 GIGTDNGNSNDSANMIGDMKFMAHIHRAVHRDADVLTPEKILEMATIDGA 450 
436 RSLGMDHEIGSIETGKRADLILLDLRHPQTTPHHHLAATIVFQAYGNEVD 485 

IlillllllMlltlMIIIIIIIIII > . Illllilllllllllll 

451 RSLGMDHEIGSIETGKRADLILLDLRHLRRLS.HHLAATIVFQAYGNEVD 499 
486 TVLIDGNWMENRRLSFLPPERELAFLEEAQSRATAILQRANMVANPAWR 535 

lllllillllillllllllMlllllililllinillilllllMIIM 

500 TVLIEXSNWMENRRLSFLPPERELAFLEEAQSRATAILQRANMVANPAWR 549 
536 SL*EMTPLLHPPPLEEIAAIIiARLGLiGGGHDLDGYRIAMNAALPSFARVE 585 

lllillllllilMllilMMIillllllllllllillllMIIIIIM 

550 SL*EMTPLLHPPPLEEIAAILARLGLGGGHDLDGYRIAMNAALPSFARVE 599 



586 SLVGEGRLRAPASRRSE . . . 602 

llllllllllllllhl 
600 SLVGEGRLRAPASRRGERPE 619 
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1 SAATAALIMKASMVTLTPLFSFSLLNCTRKASRSVMSASSWLVTC 45 

IINMIIilliillMlilllMllllllilllliiiiiiiiii 

1 SGNFLSMTAALIMKASMVTLTPLFSFSLLNCTRKASRSVMSASSWLVTC 50 
4 6 GMTTQLRCRFFDGVISALRRVTHYWRHIMQTLSIQHGTLVTMDQYRRVLG 95 

llllilllillihillllllillllllillllllllillllllllllll 

51 GMTTQLRCRFFDGIISALRRVTHYWRHIMQTLSIQHGTLVTMDQYRRVLG 100 
96 DSWVHVQDGRIVALGVHAESVPPPADRVIDARGKWLPGFINAHTHVNoi 145 

IIMIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIMMIillllllill 

101 DSWVHVQDGRIVALGVHAESVPPPADRVIDARGKWLPGFINAHTHVNQI 150 
14 6 LLRGGPSHGRQFYD'rfLFNWYPGQKAMRPEDVAVAVRLYCAEAVRSGITT 195 

MlllllllilllllllllllllMIIMIIIMIIIllMIIIMIIII 

151 LLRGGPSHGRQFYDWLFNWYPGQKAMRPEDVAVAVRLYCT^VRSGITT 200 
196 INENADSAIYPGNIEAAMAVYGEVGVRWYARMFFDRMDGRIQGYVDALK 245 

JMIillllllllllliMlilillllllllllllillllllMilllll 

201 INENADSAIYPGNIEAAMAVYGEVGVRWYARMFFDRMDGRIQGYVDALK 250 
246 ARSPQVELCSIMEETAVAKDRITALSDQYHGTAGGRISVWPAPATTTAVT 295 

lllilllMllillMIIIIIIIIIIIIIIIIIIMIIIMIIIIIIIII 

251 ARSPQVELCSIMEETAVAKDRITALSDQYHGTAGGRISVWPAPATTTAVT 300 
296 VEGMRWAQAFARDRAVMWTLHMAESDHDERIHGMSPADYMECYGLLDERL 345 

MIIMIIIIIIMIIIIMIIIIIIIIIIIIIIIihllilMlillii 

301 VEGMRWAQAFARDRAVMWTLHMAESDHDERIHGMSPAEYMECYGLLDERL 350 
346 QVAHCVYFDRKDVRLLHRH^P/KVASQWSNAYLGSGVAPVPEMVERGMAV 395 

MlliMllllillllMMIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

351 QVAHCVYFDRKDVRLLHRHNVKVASQWSNAYLGSGVAPVPEMVERGMAV 400 

396 GIGTDNGNSNDSVNMIGDMKFMAHIHRAVHRDADVLTPEKILEMATIDGA 445 

lililllliill-illlllillllllllllllllilllllllllllllll 
401 GIGTDNGNSNDSANMIGDMKFMAHIHRAVHRDADVLTPEKILEMATIDGA 450 

446 RSLGMDHEICSIETGKRADLILLDLRHPQTTPHHHLAATIVFQAYGNEVD 4 95 

lillillillllllMllllllllill ■ . Illllllllllllllll 

451 RSLGMDHEIGSIETGKRADLILLDLRHLRRLS.HHIiAATIVFQAYGNEVD 499 
4 96 TVLIDGNVVMENRRLSFLPPERELAFLEEAQSRATAILQRANMVANPAWR 545 

lilllMillMllllllllillMIIIIIIIMIIIIIIIillllilll 

500 TVLIDGNWMENRRLSFLPPERELAFLEEAQSRATAILQRANMVANPAWR 549 
546 SL*EMTPLLHPPPLEEIAAILARLGLGGGHDLDGYRIAMNAALPSFARVE 595 

lllllillillilMllilllllllllllilllllllllllllilliill 

550 SL*EMTPLLHPPPLEEIAAILARLGLGGGHDLDGyRIAMNAALPSFARVE 599 



596 SLVGEGRLRAPASRRSERPE 615 

llllllllllllilhllll 
600 SLVGEGRLRAPASRRGERPE 619 
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545 CGGTATCGGGGAATTNTTGAGCGCGGCCACAGCAGCCNTGATCATGAAGG 496 

II I Ihllllilllllllllllllllhllllllllllll 
1 . . .CTCGGGTAACTTCTTGAGCGCGGCCACAGCAGCCriGATCATGAAGG 47 

495 CGAGCATGGTGACCTNGACGCCGTNTTTTNGTTNTTTTTTGTTGAACTGC 44 6 

llllllllllllllhlllllll : llh h I lllllllllllll 

48 CGAGCATGGTGACCTTGACGCCGCTCTTTTCGTTCTCTTTGTTGAACTGC 97 

445 ACGCGAAAGG . TTCCAGGTCGGTGATGTCCGCGTCGTCGTGGTTGGTGAC 3 97 

llllllllll llllllllllllllllllllilllillllllilllllll 
98 ACGCGAAAGGCTTCCAGGTCGGTGATGTCCGCGTCGTCGTGGTTGGTGAC 147 

396 GTGCGGGATGACCACCCAGNTGCGGTGCAGGTTTTTCGATGGCATAATAT 347 

irilllllltllllllllhlllllllllllllllllMllllllillll 

148 GTGCGGGATGACCACCCAGTTGCGGTGCAGGTTTTTCGATGGCATAATAT 197 
346 CTGCGTTGCGACGTGTAACACACTANTGGAGACATATCATGCAAACGCTC 2 97 

IIIIIJIIIIIIIIilllllMllhlllllillllllllillllillll 

198 CTGCGTTGCGACGTGTAACACACTATTGGAGACATATCATGCAAACGCTC 247 

296 AGCATCCAGCACGGTACCCTCGTCACGATGGATCAGTACCGCAGAGTCCT 247 

lllllllllllll Mil IIIIIIIMIIIIIilllllMIIIIIIIIII I 
248 AGCATCCAGCACGGTACCCTCGTCACGATGGATCAGTACCGCAGAGTCCT 297 

246 TGGGGATAGNTGGGTTCACGTGCAGGATGGACGGATCGTCGCGCTCGGAG 197 

I II II I I I I : I I I I I I I 11 II I I I I I II M I II I II I II II I II I I I I I I 
298 TGGGGATAGCTGGGTTCACGTGCAGGATGGACGGATCGTCGCGCTCGGAG 347 

196 TGCACGCCGAGTCGGTGCCTCCGCCAGCGGATCGGGTGATCGATGCACGC 147 

llilllllllllllilllllllllllllllllllllllllllllllllll 
348 TGCACGCCGAGTCGGTGCCTCCGCCAGCGGATCGGGTGATCGATGCACGC 3 97 

146 GGCAAGGTCGTGTTACCCGGTTTCATCAATGCCCACACCCATGTGAACCA 97 

llllllllllllllllllllllllllllllllllllllllllllllllll 
398 GGCAAGGTCGTGTTACCCGGTTTCATCAATGCCCACACCCATGTGAACCA 44 7 

96 GATCCTCCTGC^^GGAGGGCCNTCGCACGGGCGTCAATTNTATGACTGGC 47 

llllllllllilllllllllhllllllll lllllllhllllllllll 
448 GATCCTCCTGCGCGGAGGGCCCTCGCACGGACGTCAATTCTATGACTGGC 497 

46 TGTTCAACGTTGTGTATCCGGGACAAAAGGCGATGAGACCGGAGGA 1 

llllllllllllllllllllllllllllllllllllllllllllll 
498 TGTTCAACGTTGTGT ATCCGGGACAAAAGGCGATGAGACCGGAGGACGTA 547 



SIIBSniUIE!lffiEr(|llllE2l9 



wo 98/31816 



PCT/US98/00944 



fO/fS 



1 . . .CCTGCGCGGAGGGCCTCCGCACGGGCGTCAATTCTATGACTGGCTGT 4 7 

IIIIIINMIIIII IMIMI llllllllllllillllillll 

4 51 CCTCCTGCGCGGAGGGCCCTCGCACGGACGTCAATTCTATGACTGGCTGT 500 

48 TCAACGTTGTGTATCCGGGACAAAAGGCGATGAGACCGGAGGACGTANCG 97 
lllllllililllllllllllllllilllllllillllllilllllhll 
501 TCAACGTTGTGTATCCGGGACAAAAGGCGATGAGACCGGAGGACGTAGCG 550 

98 GTGGCGGTGAGGTTGTATTGTGCGGAAGCTGTGCGCAGCGGGATTACGAC 147 

IMMIMIiMIIMIIIIMIIIIIIIIIIIIIIIIIIIMIIIMII 

551 GTGGCGGTGAGGTTGTATTGTGCGGAAGCTGTGCGCAGCGGGATTACGAC 600 

148 GATCAACGAAAACNCCGATTCGGCCATCTACCCAGGCAACATCGAGGCCG 197 

I I M I I I I I I I I I : i I I I I I i i I I I I I I I I I I I I I I I I i M I I I I I I I I I 
601 GATCAACGAAAACGCCGATTCGGCCATCTACCCAGGCAACATCGAGGCCG 650 

198 CGATGGCGGTCTATGGTGAGGTGGGTGTGAGGGTCGTCTACGCCCGCATG 247 

MllllillililMlllllllllllllllillllllMllililllill 

651 CGATGGCGGTCTATGGTGAGGTGGGTGTGAGGGTCGTCTACGCCCGCATG 700 
248 TTCTTTGATCGGATGGACGGGCGCATTCAAGGGTATGTGGACGCCTTGAA 297 

IlilllllllllllillilllllllllillllllillMlllllllliil 

701 TTCTTTGATCGGATGGACGGGCGCATTCAAGGGTATGTGGACGCCTTGAA 750 
298 GGCTCGCTCTCCCCAAGTCGAACTGTGCTCGATCATGGANGAAACNGCTG 347 

MllllllllllliilllllllllllllllillMlllhlMlhllM 

751 GGCTCGCTCTCCCC;^ AGTCGAACTGTGCTCGATCATGGAGGAAACGGCTG 800 

348 TGGCCAAAGATCGGATCACANCCCTGTCANATCANTATCATGGCACNGCA 397 

IIIIMIIIIIIillllli|:||||||||:||||:)||||||||||:||| 
801 TGGCCAAAGATCGGATCACAGCCCTGTCAGATCAGTATCATGGCACGGCA 850 

398 NGAGGTCCTATATCANTTTGGCCCGCTCCTGCCACTACCACNGCGGTGAC 447 

'^\\\\\ liiiiihiiiiiiiiiiiiiiiiiiiiiiiihiiiiiiii 

851 GGAGGTCGTATATCAGTTTGGCCCGCTCCTGCCACTACCACGGCGGTGAC 900 
448 ATTTAAANGAATCCATGGGCCA ACCTCCCCCGTGATCCGGCGGTAA 4 93 

i IMhIIII i II II llllllllllll llllllll 

901 AGTTGAAGGAATGCGATGGGCACAAGCCTTCGCCCGTGATCGGGCGGTAA 950 

494 TGTGAC * * 499 

INI 

951 TGTGGACGCTTCACATGGCGGAGAGCGATCATGATGAGCGGATTCATGGG 1000 



SliBSnilIIEIIIiEET(lllllE2iD 
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TNGCAC3GTTGTGAGCA . . TGCTACTTC 336 

hilillllllllill Mil II 

1101 ACCGCCACAATGTGAAGGTCGCGTCGCAGGTTGTGAGCAATGCCTACCTC 1150 

• • • , 

33 5 GGTTCAGGNGTGGCCCCCGTGCCAGAGATGGTGGAGCGCGGCATGGCCGT 286 

II ilMhIlllllllillllllllllllllllllllllllllllllll 

1151 GGCTCAGGGGTGGCCCCCGTGCCAGAGATGGTGGAGCGCGGCATGGCCGT 1200 
285 GGGCATTGGAACAGATAACGGGAATAGTAATGACTCCGTAAACATGATCG 236 

lllllllllllllllllllilllllllllllllll!!! Illilllllll 

1201 GGGCATTGGAACAGATAACGGGAATAGTAATGACTCCGCAAACATaATCG 1250 
235 GAGACATGAAGTTTATGGCCCATA.rCACCGCGC&GTGCATCGGGATGCG 186 

.IIIIINIMINIIilllllllllllllllllllilllllllliiiiii 

1251 GAGACATGAAGTTTATGGCCCATATTCACCGCGCGGTGCATCGGaiVTGCG 1300 

185 GACGTGCTGACCCCAGAGAAGATTNTTGAAATGGCGACGATCGATGGGGC 136 
llllllllillllllllllllllhlllllllilllllllllllllllll 
1301 GACGTGCTGACCCCAGAGAAGATTCTTGAAATGGCG.\CGATCGATGGGGC 1350 

135 GCGTTTCGTTGGGGATGGACCACGAGATTGGTTCCATCGAAACCGGCAAG 86 

Ml Mlllllll IIIIIMIIIIIIilllllllillllllllilllll 

1351 GCG.TTCGTTGGGAATGGACCACGAGATTGGTTCCATCGAAACCGGCAAG 139S 
85 CGCGCGGACCTTATCCTGCTTGACCTGCGTCACCCTCAGACGACTCCTCA 36 

llllilllilllliilllllllMllllllil illllllllllll 

1400 CGCGCGGACCTTATCCTGCTTGACCTGCGTCA.CCTCAGACGACTC. .TC 14 4 S 

3 5 CCATCATTTGGCGGCCACGATCGTGTTTCAGGCTT . i 
llllilllllllllllllllllllllllllllll 
1447 ACATCATTTGGCGGCCACGATCGTGTTTCAGGCTTACGGCAATGAGGTGG 1495 



PCT/IIS9>m944 



" m I n I m I H Mm H n N m'^"'^^^ " 

leOl AC«XTATTTX<.C*CC<^c..Ai^l II III II II I ^^^^ 

mi n I mn m M I N m mf?T?°^^^^^^ 

uso <=™«^xoAcoccorJ^Si^iil III IIIIH^ 

17S0 ™MTT=CC.TGAAT^CGGciai III I I N 

leoo c«"™^,^,,,,„,,mi 1 1 mm^ ^^^^ 

3.4 ™-™CT.....c™TCCa.TaTC.XCACCCOCOC= 443 

1850 AGGCCCGAG 

1858 
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1 •••CCTGCGCGGA.GGCCTCCGCAC^^^^ 
451 CCTCCTGCGCGGAGGGCCCTCGCACGGACCTciiiiciiTCiiicytl^ 500 

47 yC^CGTTGTGTATCCGGGACAAAAGGCGATGAGACCGGAGGACGT^^^ 96 
501 TCAACGTTGTGTATCCGGGACAAAAGGCGiiGlGliii^^^^ 55O 

97 GTGGCGGTGAGGTTGTAT^^^ 

551 GTGGCGGTGAGGTTGTATTGTGCGGAAGCTGTCiiiiGciGiiii^ 600 

601 GATCAACGAAAACGCCGATTCGGciii<!:UcC(!iiiiMciii^^^^ 650 
197 CGATGGCGGTCTATGCT^^ 

651 CGATGGCGGTCTATGGTGAGGTGGGTGTGiiiiiiiiiUcGiiii^ 700 

247 TTCTTTGATCGGATGGACGGGCGCAT^ 296 

701 TTCTTTGATCGGATGGACGGGCGCATTCAAGGGUTCicLiiiG^^ 75O 
297 GGCTCGCTCTCCrc^^ 

751 GGCTCGCTCTCCCCAAGTCGAACriTiiiiGiiiiiiUGii^^ 800 
347 yCGCC^AAGATCGGA^ 

801 TGGCCAAAGATCGGATCACAGCCyGiiiUiiiG^^^ 850 
397 NGAGGTCCTATATCA^^^ 

851 GGAGGTCGrATATCAGTTTGGCciiCTicUiciiiiii^^^^ 900 



447 ATTTNAANGAATTCCA^^^^ 

901 AGTTGAAGiAATGCGATGGGCACAAG(l:CTiLiiiiUiTCGi^^ 95O 

4 96 TGTNGACCCA 

i I I : I I i I 505 

951 TGTGGACGCTTCACATGGCGGAGAGCGATCATGATGAGCGGATTCATGGG 1000 



SiiilSTITIITESREEW2B) 
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